Search results

1 – 10 of over 1000
Article
Publication date: 18 May 2020

Xiang Chen, Yaohui Pan and Bin Luo

One challenge for tourism recommendation systems (TRSs) is the long-tail phenomenon of ratings or popularity among tourist products. This paper aims to improve the diversity and…

Abstract

Purpose

One challenge for tourism recommendation systems (TRSs) is the long-tail phenomenon of ratings or popularity among tourist products. This paper aims to improve the diversity and efficiency of TRSs utilizing the power-law distribution of long-tail data.

Design/methodology/approach

Using Sina Weibo check-in data for example, this paper demonstrates that the long-tail phenomenon exists in user travel behaviors and fits the long-tail travel data with power-law distribution. To solve data sparsity in the long-tail part and increase recommendation diversity of TRSs, the paper proposes a collaborative filtering (CF) recommendation algorithm combining with power-law distribution. Furthermore, by combining power-law distribution with locality sensitive hashing (LSH), the paper optimizes user similarity calculation to improve the calculation efficiency of TRSs.

Findings

The comparison experiments show that the proposed algorithm greatly improves the recommendation diversity and calculation efficiency while maintaining high precision and recall of recommendation, providing basis for further dynamic recommendation.

Originality/value

TRSs provide a better solution to the problem of information overload in the tourism field. However, based on the historical travel data over the whole population, most current TRSs tend to recommend hot and similar spots to users, lacking in diversity and failing to provide personalized recommendations. Meanwhile, the large high-dimensional sparse data in online social networks (OSNs) brings huge computational cost when calculating user similarity with traditional CF algorithms. In this paper, by integrating the power-law distribution of travel data and tourism recommendation technology, the authors’ work solves the problem existing in traditional TRSs that recommendation results are overly narrow and lack in serendipity, and provides users with a wider range of choices and hence improves user experience in TRSs. Meanwhile, utilizing locality sensitive hash functions, the authors’ work hashes users from high-dimensional vectors to one-dimensional integers and maps similar users into the same buckets, which realizes fast nearest neighbors search in high-dimensional space and solves the extreme sparsity problem of high dimensional travel data. Furthermore, applying the hashing results to user similarity calculation, the paper greatly reduces computational complexity and improves calculation efficiency of TRSs, which reduces the system load and enables TRSs to provide effective and timely recommendations for users.

Details

Industrial Management & Data Systems, vol. 121 no. 6
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 13 March 2024

Rong Jiang, Bin He, Zhipeng Wang, Xu Cheng, Hongrui Sang and Yanmin Zhou

Compared with traditional methods relying on manual teaching or system modeling, data-driven learning methods, such as deep reinforcement learning and imitation learning, show…

Abstract

Purpose

Compared with traditional methods relying on manual teaching or system modeling, data-driven learning methods, such as deep reinforcement learning and imitation learning, show more promising potential to cope with the challenges brought by increasingly complex tasks and environments, which have become the hot research topic in the field of robot skill learning. However, the contradiction between the difficulty of collecting robot–environment interaction data and the low data efficiency causes all these methods to face a serious data dilemma, which has become one of the key issues restricting their development. Therefore, this paper aims to comprehensively sort out and analyze the cause and solutions for the data dilemma in robot skill learning.

Design/methodology/approach

First, this review analyzes the causes of the data dilemma based on the classification and comparison of data-driven methods for robot skill learning; Then, the existing methods used to solve the data dilemma are introduced in detail. Finally, this review discusses the remaining open challenges and promising research topics for solving the data dilemma in the future.

Findings

This review shows that simulation–reality combination, state representation learning and knowledge sharing are crucial for overcoming the data dilemma of robot skill learning.

Originality/value

To the best of the authors’ knowledge, there are no surveys that systematically and comprehensively sort out and analyze the data dilemma in robot skill learning in the existing literature. It is hoped that this review can be helpful to better address the data dilemma in robot skill learning in the future.

Details

Robotic Intelligence and Automation, vol. 44 no. 2
Type: Research Article
ISSN: 2754-6969

Keywords

Article
Publication date: 9 May 2016

Shi Cheng, Qingyu Zhang and Quande Qin

The quality and quantity of data are vital for the effectiveness of problem solving. Nowadays, big data analytics, which require managing an immense amount of data rapidly, has…

6156

Abstract

Purpose

The quality and quantity of data are vital for the effectiveness of problem solving. Nowadays, big data analytics, which require managing an immense amount of data rapidly, has attracted more and more attention. It is a new research area in the field of information processing techniques. It faces the big challenges and difficulties of a large amount of data, high dimensionality, and dynamical change of data. However, such issues might be addressed with the help from other research fields, e.g., swarm intelligence (SI), which is a collection of nature-inspired searching techniques. The paper aims to discuss these issues.

Design/methodology/approach

In this paper, the potential application of SI in big data analytics is analyzed. The correspondence and association between big data analytics and SI techniques are discussed. As an example of the application of the SI algorithms in the big data processing, a commodity routing system in a port in China is introduced. Another example is the economic load dispatch problem in the planning of a modern power system.

Findings

The characteristics of big data include volume, variety, velocity, veracity, and value. In the SI algorithms, these features can be, respectively, represented as large scale, high dimensions, dynamical, noise/surrogates, and fitness/objective problems, which have been effectively solved.

Research limitations/implications

In current research, the example problem of the port is formulated but not solved yet given the ongoing nature of the project. The example could be understood as advanced IT or data processing technology, however, its underlying mechanism could be the SI algorithms. This paper is the first step in the research to utilize the SI algorithm to a big data analytics problem. The future research will compare the performance of the method and fit it in a dynamic real system.

Originality/value

Based on the combination of SI and data mining techniques, the authors can have a better understanding of the big data analytics problems, and design more effective algorithms to solve real-world big data analytical problems.

Details

Industrial Management & Data Systems, vol. 116 no. 4
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 12 October 2023

Xiaoli Su, Lijun Zeng, Bo Shao and Binlong Lin

The production planning problem with fine-grained information has hardly been considered in practice. The purpose of this study is to investigate the data-driven production…

Abstract

Purpose

The production planning problem with fine-grained information has hardly been considered in practice. The purpose of this study is to investigate the data-driven production planning problem when a manufacturer can observe historical demand data with high-dimensional mixed-frequency features, which provides fine-grained information.

Design/methodology/approach

In this study, a two-step data-driven optimization model is proposed to examine production planning with the exploitation of mixed-frequency demand data is proposed. First, an Unrestricted MIxed DAta Sampling approach is proposed, which imposes Group LASSO Penalty (GP-U-MIDAS). The use of high frequency of massive demand information is analytically justified to significantly improve the predictive ability without sacrificing goodness-of-fit. Then, integrated with the GP-U-MIDAS approach, the authors develop a multiperiod production planning model with a rolling cycle. The performance is evaluated by forecasting outcomes, production planning decisions, service levels and total cost.

Findings

Numerical results show that the key variables influencing market demand can be completely recognized through the GP-U-MIDAS approach; in particular, the selected accuracy of crucial features exceeds 92%. Furthermore, the proposed approach performs well regarding both in-sample fitting and out-of-sample forecasting throughout most of the horizons. Taking the total cost and service level obtained under the actual demand as the benchmark, the mean values of both the service level and total cost differences are reduced. The mean deviations of the service level and total cost are reduced to less than 2.4%. This indicates that when faced with fluctuating demand, the manufacturer can adopt the proposed model to effectively manage total costs and experience an enhanced service level.

Originality/value

Compared with previous studies, the authors develop a two-step data-driven optimization model by directly incorporating a potentially large number of features; the model can help manufacturers effectively identify the key features of market demand, improve the accuracy of demand estimations and make informed production decisions. Moreover, demand forecasting and optimal production decisions behave robustly with shifting demand and different cost structures, which can provide manufacturers an excellent method for solving production planning problems under demand uncertainty.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 17 September 2021

Liang He, Haiyan Xu and Ginger Y. Ke

Despite better accessibility and flexibility, peer-to-peer (P2P) lending has suffered from excessive credit risks, which may cause significant losses to the lenders and even lead…

Abstract

Purpose

Despite better accessibility and flexibility, peer-to-peer (P2P) lending has suffered from excessive credit risks, which may cause significant losses to the lenders and even lead to the collapse of P2P platforms. The purpose of this research is to construct a hybrid predictive framework that integrates classification, feature selection, and data balance algorithms to cope with the high-dimensional and imbalanced nature of P2P credit data.

Design/methodology/approach

An improved synthetic minority over-sampling technique (IMSMOTE) is developed to incorporate the randomness and probability into the traditional synthetic minority over-sampling technique (SMOTE) to enhance the quality of synthetic samples and the controllability of synthetic processes. IMSMOTE is then implemented along with the grey relational clustering (GRC) and the support vector machine (SVM) to facilitate a comprehensive assessment of the P2P credit risks. To enhance the associativity and functionality of the algorithm, a dynamic selection approach is integrated with GRC and then fed in the SVM's process of parameter adaptive adjustment to select the optimal critical value. A quantitative model is constructed to recognize key criteria via multidimensional representativeness.

Findings

A series of experiments based on real-world P2P data from Prosper Funding LLC demonstrates that our proposed model outperforms other existing approaches. It is also confirmed that the grey-based GRC approach with dynamic selection succeeds in reducing data dimensions, selecting a critical value, identifying key criteria, and IMSMOTE can efficiently handle the imbalanced data.

Originality/value

The grey-based machine-learning framework proposed in this work can be practically implemented by P2P platforms in predicting the borrowers' credit risks. The dynamic selection approach makes the first attempt in the literature to select a critical value and indicate key criteria in a dynamic, visual and quantitative manner.

Details

Grey Systems: Theory and Application, vol. 12 no. 3
Type: Research Article
ISSN: 2043-9377

Keywords

Article
Publication date: 22 July 2021

Han Liu, Ying Liu, Gang Li and Long Wen

This study aims to examine whether and when real-time updated online search engine data such as the daily Baidu Index can be useful for improving the accuracy of tourism demand…

Abstract

Purpose

This study aims to examine whether and when real-time updated online search engine data such as the daily Baidu Index can be useful for improving the accuracy of tourism demand nowcasting once monthly official statistical data, including historical visitor arrival data and macroeconomic variables, become available.

Design/methodology/approach

This study is the first attempt to use the LASSO-MIDAS model proposed by Marsilli (2014) to field of the tourism demand forecasting to deal with the inconsistency in the frequency of data and the curse problem caused by the high dimensionality of search engine data.

Findings

The empirical results in the context of visitor arrivals in Hong Kong show that the application of a combination of daily Baidu Index data and monthly official statistical data produces more accurate nowcasting results when MIDAS-type models are used. The effectiveness of the LASSO-MIDAS model for tourism demand nowcasting indicates that such penalty-based MIDAS model is a useful option when using high-dimensional mixed-frequency data.

Originality/value

This study represents the first attempt to progressively compare whether there are any differences between using daily search engine data, monthly official statistical data and a combination of the aforementioned two types of data with different frequencies to nowcast tourism demand. This study also contributes to the tourism forecasting literature by presenting the first attempt to evaluate the applicability and effectiveness of the LASSO-MIDAS model in tourism demand nowcasting.

Details

International Journal of Contemporary Hospitality Management, vol. 33 no. 6
Type: Research Article
ISSN: 0959-6119

Keywords

Abstract

Details

Machine Learning and Artificial Intelligence in Marketing and Sales
Type: Book
ISBN: 978-1-80043-881-1

Book part
Publication date: 24 March 2006

Valeriy V. Gavrishchaka

Increasing availability of the financial data has opened new opportunities for quantitative modeling. It has also exposed limitations of the existing frameworks, such as low…

Abstract

Increasing availability of the financial data has opened new opportunities for quantitative modeling. It has also exposed limitations of the existing frameworks, such as low accuracy of the simplified analytical models and insufficient interpretability and stability of the adaptive data-driven algorithms. I make the case that boosting (a novel, ensemble learning technique) can serve as a simple and robust framework for combining the best features of the analytical and data-driven models. Boosting-based frameworks for typical financial and econometric applications are outlined. The implementation of a standard boosting procedure is illustrated in the context of the problem of symbolic volatility forecasting for IBM stock time series. It is shown that the boosted collection of the generalized autoregressive conditional heteroskedastic (GARCH)-type models is systematically more accurate than both the best single model in the collection and the widely used GARCH(1,1) model.

Details

Econometric Analysis of Financial and Economic Time Series
Type: Book
ISBN: 978-1-84950-388-4

Article
Publication date: 29 December 2022

Xiaoguang Tian, Robert Pavur, Henry Han and Lili Zhang

Studies on mining text and generating intelligence on human resource documents are rare. This research aims to use artificial intelligence and machine learning techniques to…

2357

Abstract

Purpose

Studies on mining text and generating intelligence on human resource documents are rare. This research aims to use artificial intelligence and machine learning techniques to facilitate the employee selection process through latent semantic analysis (LSA), bidirectional encoder representations from transformers (BERT) and support vector machines (SVM). The research also compares the performance of different machine learning, text vectorization and sampling approaches on the human resource (HR) resume data.

Design/methodology/approach

LSA and BERT are used to discover and understand the hidden patterns from a textual resume dataset, and SVM is applied to build the screening model and improve performance.

Findings

Based on the results of this study, LSA and BERT are proved useful in retrieving critical topics, and SVM can optimize the prediction model performance with the help of cross-validation and variable selection strategies.

Research limitations/implications

The technique and its empirical conclusions provide a practical, theoretical basis and reference for HR research.

Practical implications

The novel methods proposed in the study can assist HR practitioners in designing and improving their existing recruitment process. The topic detection techniques used in the study provide HR practitioners insights to identify the skill set of a particular recruiting position.

Originality/value

To the best of the authors’ knowledge, this research is the first study that uses LSA, BERT, SVM and other machine learning models in human resource management and resume classification. Compared with the existing machine learning-based resume screening system, the proposed system can provide more interpretable insights for HR professionals to understand the recommendation results through the topics extracted from the resumes. The findings of this study can also help organizations to find a better and effective approach for resume screening and evaluation.

Details

Business Process Management Journal, vol. 29 no. 1
Type: Research Article
ISSN: 1463-7154

Keywords

Article
Publication date: 29 June 2021

Daejin Kim, Hyoung-Goo Kang, Kyounghun Bae and Seongmin Jeon

To overcome the shortcomings of traditional industry classification systems such as the Standard Industrial Classification Standard Industrial Classification, North American…

Abstract

Purpose

To overcome the shortcomings of traditional industry classification systems such as the Standard Industrial Classification Standard Industrial Classification, North American Industry Classification System North American Industry Classification System, and Global Industry Classification Standard Global Industry Classification Standard, the authors explore industry classifications using machine learning methods as an application of interpretable artificial intelligence (AI).

Design/methodology/approach

The authors propose a text-based industry classification combined with a machine learning technique by extracting distinguishable features from business descriptions in financial reports. The proposed method can reduce the dimensions of word vectors to avoid the curse of dimensionality when measuring the similarities of firms.

Findings

Using the proposed method, the sample firms form clusters of distinctive industries, thus overcoming the limitations of existing classifications. The method also clarifies industry boundaries based on lower-dimensional information. The graphical closeness between industries can reflect the industry-level relationship as well as the closeness between individual firms.

Originality/value

The authors’ work contributes to the industry classification literature by empirically investigating the effectiveness of machine learning methods. The text mining method resolves issues concerning the timeliness of traditional industry classifications by capturing new information in annual reports. In addition, the authors’ approach can solve the computing concerns of high dimensionality.

Details

Internet Research, vol. 32 no. 2
Type: Research Article
ISSN: 1066-2243

Keywords

1 – 10 of over 1000