Search results
1 – 10 of over 1000Xiang Chen, Yaohui Pan and Bin Luo
One challenge for tourism recommendation systems (TRSs) is the long-tail phenomenon of ratings or popularity among tourist products. This paper aims to improve the diversity and…
Abstract
Purpose
One challenge for tourism recommendation systems (TRSs) is the long-tail phenomenon of ratings or popularity among tourist products. This paper aims to improve the diversity and efficiency of TRSs utilizing the power-law distribution of long-tail data.
Design/methodology/approach
Using Sina Weibo check-in data for example, this paper demonstrates that the long-tail phenomenon exists in user travel behaviors and fits the long-tail travel data with power-law distribution. To solve data sparsity in the long-tail part and increase recommendation diversity of TRSs, the paper proposes a collaborative filtering (CF) recommendation algorithm combining with power-law distribution. Furthermore, by combining power-law distribution with locality sensitive hashing (LSH), the paper optimizes user similarity calculation to improve the calculation efficiency of TRSs.
Findings
The comparison experiments show that the proposed algorithm greatly improves the recommendation diversity and calculation efficiency while maintaining high precision and recall of recommendation, providing basis for further dynamic recommendation.
Originality/value
TRSs provide a better solution to the problem of information overload in the tourism field. However, based on the historical travel data over the whole population, most current TRSs tend to recommend hot and similar spots to users, lacking in diversity and failing to provide personalized recommendations. Meanwhile, the large high-dimensional sparse data in online social networks (OSNs) brings huge computational cost when calculating user similarity with traditional CF algorithms. In this paper, by integrating the power-law distribution of travel data and tourism recommendation technology, the authors’ work solves the problem existing in traditional TRSs that recommendation results are overly narrow and lack in serendipity, and provides users with a wider range of choices and hence improves user experience in TRSs. Meanwhile, utilizing locality sensitive hash functions, the authors’ work hashes users from high-dimensional vectors to one-dimensional integers and maps similar users into the same buckets, which realizes fast nearest neighbors search in high-dimensional space and solves the extreme sparsity problem of high dimensional travel data. Furthermore, applying the hashing results to user similarity calculation, the paper greatly reduces computational complexity and improves calculation efficiency of TRSs, which reduces the system load and enables TRSs to provide effective and timely recommendations for users.
Details
Keywords
Rong Jiang, Bin He, Zhipeng Wang, Xu Cheng, Hongrui Sang and Yanmin Zhou
Compared with traditional methods relying on manual teaching or system modeling, data-driven learning methods, such as deep reinforcement learning and imitation learning, show…
Abstract
Purpose
Compared with traditional methods relying on manual teaching or system modeling, data-driven learning methods, such as deep reinforcement learning and imitation learning, show more promising potential to cope with the challenges brought by increasingly complex tasks and environments, which have become the hot research topic in the field of robot skill learning. However, the contradiction between the difficulty of collecting robot–environment interaction data and the low data efficiency causes all these methods to face a serious data dilemma, which has become one of the key issues restricting their development. Therefore, this paper aims to comprehensively sort out and analyze the cause and solutions for the data dilemma in robot skill learning.
Design/methodology/approach
First, this review analyzes the causes of the data dilemma based on the classification and comparison of data-driven methods for robot skill learning; Then, the existing methods used to solve the data dilemma are introduced in detail. Finally, this review discusses the remaining open challenges and promising research topics for solving the data dilemma in the future.
Findings
This review shows that simulation–reality combination, state representation learning and knowledge sharing are crucial for overcoming the data dilemma of robot skill learning.
Originality/value
To the best of the authors’ knowledge, there are no surveys that systematically and comprehensively sort out and analyze the data dilemma in robot skill learning in the existing literature. It is hoped that this review can be helpful to better address the data dilemma in robot skill learning in the future.
Details
Keywords
Shi Cheng, Qingyu Zhang and Quande Qin
The quality and quantity of data are vital for the effectiveness of problem solving. Nowadays, big data analytics, which require managing an immense amount of data rapidly, has…
Abstract
Purpose
The quality and quantity of data are vital for the effectiveness of problem solving. Nowadays, big data analytics, which require managing an immense amount of data rapidly, has attracted more and more attention. It is a new research area in the field of information processing techniques. It faces the big challenges and difficulties of a large amount of data, high dimensionality, and dynamical change of data. However, such issues might be addressed with the help from other research fields, e.g., swarm intelligence (SI), which is a collection of nature-inspired searching techniques. The paper aims to discuss these issues.
Design/methodology/approach
In this paper, the potential application of SI in big data analytics is analyzed. The correspondence and association between big data analytics and SI techniques are discussed. As an example of the application of the SI algorithms in the big data processing, a commodity routing system in a port in China is introduced. Another example is the economic load dispatch problem in the planning of a modern power system.
Findings
The characteristics of big data include volume, variety, velocity, veracity, and value. In the SI algorithms, these features can be, respectively, represented as large scale, high dimensions, dynamical, noise/surrogates, and fitness/objective problems, which have been effectively solved.
Research limitations/implications
In current research, the example problem of the port is formulated but not solved yet given the ongoing nature of the project. The example could be understood as advanced IT or data processing technology, however, its underlying mechanism could be the SI algorithms. This paper is the first step in the research to utilize the SI algorithm to a big data analytics problem. The future research will compare the performance of the method and fit it in a dynamic real system.
Originality/value
Based on the combination of SI and data mining techniques, the authors can have a better understanding of the big data analytics problems, and design more effective algorithms to solve real-world big data analytical problems.
Details
Keywords
Xiaoli Su, Lijun Zeng, Bo Shao and Binlong Lin
The production planning problem with fine-grained information has hardly been considered in practice. The purpose of this study is to investigate the data-driven production…
Abstract
Purpose
The production planning problem with fine-grained information has hardly been considered in practice. The purpose of this study is to investigate the data-driven production planning problem when a manufacturer can observe historical demand data with high-dimensional mixed-frequency features, which provides fine-grained information.
Design/methodology/approach
In this study, a two-step data-driven optimization model is proposed to examine production planning with the exploitation of mixed-frequency demand data is proposed. First, an Unrestricted MIxed DAta Sampling approach is proposed, which imposes Group LASSO Penalty (GP-U-MIDAS). The use of high frequency of massive demand information is analytically justified to significantly improve the predictive ability without sacrificing goodness-of-fit. Then, integrated with the GP-U-MIDAS approach, the authors develop a multiperiod production planning model with a rolling cycle. The performance is evaluated by forecasting outcomes, production planning decisions, service levels and total cost.
Findings
Numerical results show that the key variables influencing market demand can be completely recognized through the GP-U-MIDAS approach; in particular, the selected accuracy of crucial features exceeds 92%. Furthermore, the proposed approach performs well regarding both in-sample fitting and out-of-sample forecasting throughout most of the horizons. Taking the total cost and service level obtained under the actual demand as the benchmark, the mean values of both the service level and total cost differences are reduced. The mean deviations of the service level and total cost are reduced to less than 2.4%. This indicates that when faced with fluctuating demand, the manufacturer can adopt the proposed model to effectively manage total costs and experience an enhanced service level.
Originality/value
Compared with previous studies, the authors develop a two-step data-driven optimization model by directly incorporating a potentially large number of features; the model can help manufacturers effectively identify the key features of market demand, improve the accuracy of demand estimations and make informed production decisions. Moreover, demand forecasting and optimal production decisions behave robustly with shifting demand and different cost structures, which can provide manufacturers an excellent method for solving production planning problems under demand uncertainty.
Details
Keywords
Liang He, Haiyan Xu and Ginger Y. Ke
Despite better accessibility and flexibility, peer-to-peer (P2P) lending has suffered from excessive credit risks, which may cause significant losses to the lenders and even lead…
Abstract
Purpose
Despite better accessibility and flexibility, peer-to-peer (P2P) lending has suffered from excessive credit risks, which may cause significant losses to the lenders and even lead to the collapse of P2P platforms. The purpose of this research is to construct a hybrid predictive framework that integrates classification, feature selection, and data balance algorithms to cope with the high-dimensional and imbalanced nature of P2P credit data.
Design/methodology/approach
An improved synthetic minority over-sampling technique (IMSMOTE) is developed to incorporate the randomness and probability into the traditional synthetic minority over-sampling technique (SMOTE) to enhance the quality of synthetic samples and the controllability of synthetic processes. IMSMOTE is then implemented along with the grey relational clustering (GRC) and the support vector machine (SVM) to facilitate a comprehensive assessment of the P2P credit risks. To enhance the associativity and functionality of the algorithm, a dynamic selection approach is integrated with GRC and then fed in the SVM's process of parameter adaptive adjustment to select the optimal critical value. A quantitative model is constructed to recognize key criteria via multidimensional representativeness.
Findings
A series of experiments based on real-world P2P data from Prosper Funding LLC demonstrates that our proposed model outperforms other existing approaches. It is also confirmed that the grey-based GRC approach with dynamic selection succeeds in reducing data dimensions, selecting a critical value, identifying key criteria, and IMSMOTE can efficiently handle the imbalanced data.
Originality/value
The grey-based machine-learning framework proposed in this work can be practically implemented by P2P platforms in predicting the borrowers' credit risks. The dynamic selection approach makes the first attempt in the literature to select a critical value and indicate key criteria in a dynamic, visual and quantitative manner.
Details
Keywords
Abstract
Purpose
This study aims to examine whether and when real-time updated online search engine data such as the daily Baidu Index can be useful for improving the accuracy of tourism demand nowcasting once monthly official statistical data, including historical visitor arrival data and macroeconomic variables, become available.
Design/methodology/approach
This study is the first attempt to use the LASSO-MIDAS model proposed by Marsilli (2014) to field of the tourism demand forecasting to deal with the inconsistency in the frequency of data and the curse problem caused by the high dimensionality of search engine data.
Findings
The empirical results in the context of visitor arrivals in Hong Kong show that the application of a combination of daily Baidu Index data and monthly official statistical data produces more accurate nowcasting results when MIDAS-type models are used. The effectiveness of the LASSO-MIDAS model for tourism demand nowcasting indicates that such penalty-based MIDAS model is a useful option when using high-dimensional mixed-frequency data.
Originality/value
This study represents the first attempt to progressively compare whether there are any differences between using daily search engine data, monthly official statistical data and a combination of the aforementioned two types of data with different frequencies to nowcast tourism demand. This study also contributes to the tourism forecasting literature by presenting the first attempt to evaluate the applicability and effectiveness of the LASSO-MIDAS model in tourism demand nowcasting.
Details
Keywords
Increasing availability of the financial data has opened new opportunities for quantitative modeling. It has also exposed limitations of the existing frameworks, such as low…
Abstract
Increasing availability of the financial data has opened new opportunities for quantitative modeling. It has also exposed limitations of the existing frameworks, such as low accuracy of the simplified analytical models and insufficient interpretability and stability of the adaptive data-driven algorithms. I make the case that boosting (a novel, ensemble learning technique) can serve as a simple and robust framework for combining the best features of the analytical and data-driven models. Boosting-based frameworks for typical financial and econometric applications are outlined. The implementation of a standard boosting procedure is illustrated in the context of the problem of symbolic volatility forecasting for IBM stock time series. It is shown that the boosted collection of the generalized autoregressive conditional heteroskedastic (GARCH)-type models is systematically more accurate than both the best single model in the collection and the widely used GARCH(1,1) model.
Xiaoguang Tian, Robert Pavur, Henry Han and Lili Zhang
Studies on mining text and generating intelligence on human resource documents are rare. This research aims to use artificial intelligence and machine learning techniques to…
Abstract
Purpose
Studies on mining text and generating intelligence on human resource documents are rare. This research aims to use artificial intelligence and machine learning techniques to facilitate the employee selection process through latent semantic analysis (LSA), bidirectional encoder representations from transformers (BERT) and support vector machines (SVM). The research also compares the performance of different machine learning, text vectorization and sampling approaches on the human resource (HR) resume data.
Design/methodology/approach
LSA and BERT are used to discover and understand the hidden patterns from a textual resume dataset, and SVM is applied to build the screening model and improve performance.
Findings
Based on the results of this study, LSA and BERT are proved useful in retrieving critical topics, and SVM can optimize the prediction model performance with the help of cross-validation and variable selection strategies.
Research limitations/implications
The technique and its empirical conclusions provide a practical, theoretical basis and reference for HR research.
Practical implications
The novel methods proposed in the study can assist HR practitioners in designing and improving their existing recruitment process. The topic detection techniques used in the study provide HR practitioners insights to identify the skill set of a particular recruiting position.
Originality/value
To the best of the authors’ knowledge, this research is the first study that uses LSA, BERT, SVM and other machine learning models in human resource management and resume classification. Compared with the existing machine learning-based resume screening system, the proposed system can provide more interpretable insights for HR professionals to understand the recommendation results through the topics extracted from the resumes. The findings of this study can also help organizations to find a better and effective approach for resume screening and evaluation.
Details
Keywords
Daejin Kim, Hyoung-Goo Kang, Kyounghun Bae and Seongmin Jeon
To overcome the shortcomings of traditional industry classification systems such as the Standard Industrial Classification Standard Industrial Classification, North American…
Abstract
Purpose
To overcome the shortcomings of traditional industry classification systems such as the Standard Industrial Classification Standard Industrial Classification, North American Industry Classification System North American Industry Classification System, and Global Industry Classification Standard Global Industry Classification Standard, the authors explore industry classifications using machine learning methods as an application of interpretable artificial intelligence (AI).
Design/methodology/approach
The authors propose a text-based industry classification combined with a machine learning technique by extracting distinguishable features from business descriptions in financial reports. The proposed method can reduce the dimensions of word vectors to avoid the curse of dimensionality when measuring the similarities of firms.
Findings
Using the proposed method, the sample firms form clusters of distinctive industries, thus overcoming the limitations of existing classifications. The method also clarifies industry boundaries based on lower-dimensional information. The graphical closeness between industries can reflect the industry-level relationship as well as the closeness between individual firms.
Originality/value
The authors’ work contributes to the industry classification literature by empirically investigating the effectiveness of machine learning methods. The text mining method resolves issues concerning the timeliness of traditional industry classifications by capturing new information in annual reports. In addition, the authors’ approach can solve the computing concerns of high dimensionality.
Details