Search results
1 – 10 of over 4000Guanzheng Wang, Yinbo Xu, Zhihong Liu, Xin Xu, Xiangke Wang and Jiarun Yan
This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample…
Abstract
Purpose
This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample efficiency in DRL and speed up the training. To improve the applicability and reliability of the DRL-based approach in multi-UAV control problems.
Design/methodology/approach
In this paper, a fully distributed collision detection and avoidance approach for multi-UAV based on DRL is proposed. A method that integrates human experience into policy training via a human experience-based adviser is proposed. The authors propose a hybrid control method which combines the learning-based policy with traditional model-based control. Extensive experiments including simulations, real flights and comparative experiments are conducted to evaluate the performance of the approach.
Findings
A fully distributed multi-UAV collision detection and avoidance method based on DRL is realized. The reward curve shows that the training process when integrating human experience is significantly accelerated and the mean episode reward is higher than the pure DRL method. The experimental results show that the DRL method with human experience integration has a significant improvement than the pure DRL method for multi-UAV collision detection and avoidance. Moreover, the safer flight brought by the hybrid control method has also been validated.
Originality/value
The fully distributed architecture is suitable for large-scale unmanned aerial vehicle (UAV) swarms and real applications. The DRL method with human experience integration has significantly accelerated the training compared to the pure DRL method. The proposed hybrid control strategy makes up for the shortcomings of two-dimensional light detection and ranging and other puzzles in applications.
Details
Keywords
Jinbao Fang, Qiyu Sun, Yukun Chen and Yang Tang
This work aims to combine the cloud robotics technologies with deep reinforcement learning to build a distributed training architecture and accelerate the learning procedure of…
Abstract
Purpose
This work aims to combine the cloud robotics technologies with deep reinforcement learning to build a distributed training architecture and accelerate the learning procedure of autonomous systems. Especially, a distributed training architecture for navigating unmanned aerial vehicles (UAVs) in complicated dynamic environments is proposed.
Design/methodology/approach
This study proposes a distributed training architecture named experience-sharing learner-worker (ESLW) for deep reinforcement learning to navigate UAVs in dynamic environments, which is inspired by cloud-based techniques. With the ESLW architecture, multiple worker nodes operating in different environments can generate training data in parallel, and then the learner node trains a policy through the training data collected by the worker nodes. Besides, this study proposes an extended experience replay (EER) strategy to ensure the method can be applied to experience sequences to improve training efficiency. To learn more about dynamic environments, convolutional long short-term memory (ConvLSTM) modules are adopted to extract spatiotemporal information from training sequences.
Findings
Experimental results demonstrate that the ESLW architecture and the EER strategy accelerate the convergence speed and the ConvLSTM modules specialize in extract sequential information when navigating UAVs in dynamic environments.
Originality/value
Inspired by the cloud robotics technologies, this study proposes a distributed ESLW architecture for navigating UAVs in dynamic environments. Besides, the EER strategy is proposed to speed up training processes of experience sequences, and the ConvLSTM modules are added to networks to make full use of the sequential experiences.
Details
Keywords
Volodymyr Novykov, Christopher Bilson, Adrian Gepp, Geoff Harris and Bruce James Vanstone
Machine learning (ML), and deep learning in particular, is gaining traction across a myriad of real-life applications. Portfolio management is no exception. This paper provides a…
Abstract
Purpose
Machine learning (ML), and deep learning in particular, is gaining traction across a myriad of real-life applications. Portfolio management is no exception. This paper provides a systematic literature review of deep learning applications for portfolio management. The findings are likely to be valuable for industry practitioners and researchers alike, experimenting with novel portfolio management approaches and furthering investment management practice.
Design/methodology/approach
This review follows the guidance and methodology of Linnenluecke et al. (2020), Massaro et al. (2016) and Fisch and Block (2018) to first identify relevant literature based on an appropriately developed search phrase, filter the resultant set of publications and present descriptive and analytical findings of the research itself and its metadata.
Findings
The authors find a strong dominance of reinforcement learning algorithms applied to the field, given their through-time portfolio management capabilities. Other well-known deep learning models, such as convolutional neural network (CNN) and recurrent neural network (RNN) and its derivatives, have shown to be well-suited for time-series forecasting. Most recently, the number of papers published in the field has been increasing, potentially driven by computational advances, hardware accessibility and data availability. The review shows several promising applications and identifies future research opportunities, including better balance on the risk-reward spectrum, novel ways to reduce data dimensionality and pre-process the inputs, stronger focus on direct weights generation, novel deep learning architectures and consistent data choices.
Originality/value
Several systematic reviews have been conducted with a broader focus of ML applications in finance. However, to the best of the authors’ knowledge, this is the first review to focus on deep learning architectures and their applications in the investment portfolio management problem. The review also presents a novel universal taxonomy of models used.
Details
Keywords
Mu Shengdong, Wang Fengyu, Xiong Zhengxian, Zhuang Xiao and Zhang Lunfeng
With the advent of the web computing era, the transmission mode of the Internet of Everything has caused an explosion in data volume, which has brought severe challenges to…
Abstract
Purpose
With the advent of the web computing era, the transmission mode of the Internet of Everything has caused an explosion in data volume, which has brought severe challenges to traditional routing protocols. The limitations of the existing routing protocols under the condition of rapid data growth are elaborated, and the routing problem is remodeled as a Markov decision process. this paper aims to solve the problem of high blocking probability due to the increase in data volume by combining deep reinforcement learning. Finally, the correctness of the proposed algorithm in this paper is verified by simulation.
Design/methodology/approach
The limitations of the existing routing protocols under the condition of rapid data growth are elaborated and the routing problem is remodeled as a Markov decision process. Based on this, a deep reinforcement learning method is used to select the next-hop router for each data transmission task, thereby minimizing the length of the data transmission path while avoiding data congestion.
Findings
Simulation results show that the proposed method can significantly reduce the probability of data congestion and increase network throughput.
Originality/value
This paper proposes an intelligent routing algorithm for the network congestion caused by the explosive growth of data volume in the future of the big data era. With the help of deep reinforcement learning, it is possible to dynamically select the transmission jump router according to the current network state, thereby reducing the probability of congestion and improving network throughput.
Details
Keywords
Ji Fang, Vincent C.S. Lee and Haiyan Wang
This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource…
Abstract
Purpose
This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource utilisation and deliver interactive health information service.
Design/methodology/approach
An adaptive optimal service resource management strategy was developed considering a value co-creation model in health information service with a focus on collaborative and interactive with users. The deep reinforcement learning algorithm was embedded in the Internet of Things (IoT)-based health information service system (I-HISS) to allocate service resources by controlling service provision and service adaptation based on user engagement behaviour. The simulation experiments were conducted to evaluate the significance of the proposed algorithm under different user reactions to the health information service.
Findings
The results indicate that the proposed service resource management strategy, considering user co-creation in the service delivery, process improved both the service provider’s business revenue and users' individual benefits.
Practical implications
The findings may facilitate the design and implementation of health information services that can achieve a high user service experience with low service operation costs.
Originality/value
This study is amongst the first to propose a service resource management model in I-HISS, considering the value co-creation of the user in the service-dominant logic. The novel artificial intelligence algorithm is developed using the deep reinforcement learning method to learn the adaptive service resource management strategy. The results emphasise user engagement in the health information service process.
Details
Keywords
Ke Xu, Fengge Wu and Junsuo Zhao
Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control…
Abstract
Purpose
Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control systems are still using classical control technics such as proportional – integral – derivative and slide mode control as major solutions, facing problems with adaptability and automation.
Design/methodology/approach
In this paper, an approach based on deep reinforcement learning is proposed to increase adaptability and autonomy of satellite control system. It is a model-based algorithm which could find solutions with fewer episodes of learning than model-free algorithms.
Findings
Simulation experiment shows that when classical control crashed, this approach could find solution and reach the target with hundreds times of explorations and learning.
Originality/value
This approach is a non-gradient method using heuristic search to optimize policy to avoid local optima. Compared with classical control technics, this approach does not need prior knowledge of satellite or its orbit, has the ability to adapt different kinds of situations with data learning and has the ability to adapt different kinds of satellite and different tasks through transfer learning.
Details
Keywords
Minghui Zhao, Xian Guo, Xuebo Zhang, Yongchun Fang and Yongsheng Ou
This paper aims to automatically plan sequence for complex assembly products and improve assembly efficiency.
Abstract
Purpose
This paper aims to automatically plan sequence for complex assembly products and improve assembly efficiency.
Design/methodology/approach
An assembly sequence planning system for workpieces (ASPW) based on deep reinforcement learning is proposed in this paper. However, there exist enormous challenges for using DRL to this problem due to the sparse reward and the lack of training environment. In this paper, a novel ASPW-DQN algorithm is proposed and a training platform is built to overcome these challenges.
Findings
The system can get a good decision-making result and a generalized model suitable for other assembly problems. The experiments conducted in Gazebo show good results and great potential of this approach.
Originality/value
The proposed ASPW-DQN unites the curriculum learning and parameter transfer, which can avoid the explosive growth of assembly relations and improve system efficiency. It is combined with realistic physics simulation engine Gazebo to provide required training environment. Additionally with the effect of deep neural networks, the result can be easily applied to other similar tasks.
Details
Keywords
In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input…
Abstract
Purpose
In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input features, considering scarce resources and constrains) that cannot be solved by classical machine learning. The authors include a comparative study to build intrusion detection based on statistical machine learning and representational learning, using knowledge discovery in databases (KDD) Cup99 and Installation Support Center of Expertise (ISCX) 2012.
Design/methodology/approach
The methodology applies a data analytics approach, consisting of data exploration and machine learning model training and evaluation. To build a network-based intrusion detection system, the authors apply dueling double deep Q-networks architecture enabled with costly features, k-nearest neighbors (K-NN), support-vector machines (SVM) and convolution neural networks (CNN).
Findings
Machine learning-based intrusion detection are trained on historical datasets which lead to model drift and lack of generalization whereas RL is trained with data collected through interactions. RL is bound to learn from its interactions with a stochastic environment in the absence of a training dataset whereas supervised learning simply learns from collected data and require less computational resources.
Research limitations/implications
All machine learning models have achieved high accuracy values and performance. One potential reason is that both datasets are simulated, and not realistic. It was not clear whether a validation was ever performed to show that data were collected from real network traffics.
Practical implications
The study provides guidelines to implement IDS with classical supervised learning, deep learning and RL.
Originality/value
The research applied the dueling double deep Q-networks architecture enabled with costly features to build network-based intrusion detection from network traffics. This research presents a comparative study of reinforcement-based instruction detection with counterparts built with statistical and representational machine learning.
Details
Keywords
English original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies…
Abstract
Purpose
English original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies and reviews, this paper proposed an improved deep reinforcement learning algorithm for the recommendation of movies. In fact, although the conventional movies recommendation algorithms have solved the problem of information overload, they still have their limitations in the case of cold start-up and sparse data.
Design/methodology/approach
To solve the aforementioned problems of conventional movies recommendation algorithms, this paper proposed a recommendation algorithm based on the theory of deep reinforcement learning, which uses the deep deterministic policy gradient (DDPG) algorithm to solve the cold starting and sparse data problems and uses Item2vec to transform discrete action space into a continuous one. Meanwhile, a reward function combining with cosine distance and Euclidean distance is proposed to ensure that the neural network does not converge to local optimum prematurely.
Findings
In order to verify the feasibility and validity of the proposed algorithm, the state of the art and the proposed algorithm are compared in indexes of RMSE, recall rate and accuracy based on the MovieLens English original movie data set for the experiments. Experimental results have shown that the proposed algorithm is superior to the conventional algorithm in various indicators.
Originality/value
Applying the proposed algorithm to recommend English original movies, DDPG policy produces better recommendation results and alleviates the impact of cold start and sparse data.
Details
Keywords
Jiajun Xu, Linsen Xu, Gaoxin Cheng, Jia Shi, Jinfu Liu, Xingcan Liang and Shengyao Fan
This paper aims to propose a bilateral robotic system for lower extremity hemiparesis rehabilitation. The hemiplegic patients can complete rehabilitation exercise voluntarily with…
Abstract
Purpose
This paper aims to propose a bilateral robotic system for lower extremity hemiparesis rehabilitation. The hemiplegic patients can complete rehabilitation exercise voluntarily with the assistance of the robot. The reinforcement learning is included in the robot control system, enhancing the muscle activation of the impaired limbs (ILs) efficiently with ensuring the patients’ safety.
Design/methodology/approach
A bilateral leader–follower robotic system is constructed for lower extremity hemiparesis rehabilitation, where the leader robot interacts with the healthy limb (HL) and the follow robot is worn by the IL. The therapeutic training is transferred from the HL to the IL with the assistance of the robot, and the IL follows the motion trajectory prescribed by the HL, which is called the mirror therapy. The model reference adaptive impedance control is used for the leader robot, and the reinforcement learning controller is designed for the follower robot. The reinforcement learning aims to increase the muscle activation of the IL and ensure that its motion can be mastered by the HL for safety. An asynchronous algorithm is designed by improving experience relay to run in parallel on multiple robotic platforms to reduce learning time.
Findings
Through clinical tests, the lower extremity hemiplegic patients can rehabilitate with high efficiency using the robotic system. Also, the proposed scheme outperforms other state-of-the-art methods in tracking performance, muscle activation, learning efficiency and rehabilitation efficacy.
Originality/value
Using the aimed robotic system, the lower extremity hemiplegic patients with different movement abilities can obtain better rehabilitation efficacy.
Details