Search results

1 – 10 of over 4000
Article
Publication date: 24 September 2021

Guanzheng Wang, Yinbo Xu, Zhihong Liu, Xin Xu, Xiangke Wang and Jiarun Yan

This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample…

Abstract

Purpose

This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample efficiency in DRL and speed up the training. To improve the applicability and reliability of the DRL-based approach in multi-UAV control problems.

Design/methodology/approach

In this paper, a fully distributed collision detection and avoidance approach for multi-UAV based on DRL is proposed. A method that integrates human experience into policy training via a human experience-based adviser is proposed. The authors propose a hybrid control method which combines the learning-based policy with traditional model-based control. Extensive experiments including simulations, real flights and comparative experiments are conducted to evaluate the performance of the approach.

Findings

A fully distributed multi-UAV collision detection and avoidance method based on DRL is realized. The reward curve shows that the training process when integrating human experience is significantly accelerated and the mean episode reward is higher than the pure DRL method. The experimental results show that the DRL method with human experience integration has a significant improvement than the pure DRL method for multi-UAV collision detection and avoidance. Moreover, the safer flight brought by the hybrid control method has also been validated.

Originality/value

The fully distributed architecture is suitable for large-scale unmanned aerial vehicle (UAV) swarms and real applications. The DRL method with human experience integration has significantly accelerated the training compared to the pure DRL method. The proposed hybrid control strategy makes up for the shortcomings of two-dimensional light detection and ranging and other puzzles in applications.

Details

Industrial Robot: the international journal of robotics research and application, vol. 49 no. 2
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 7 April 2021

Jinbao Fang, Qiyu Sun, Yukun Chen and Yang Tang

This work aims to combine the cloud robotics technologies with deep reinforcement learning to build a distributed training architecture and accelerate the learning procedure of…

Abstract

Purpose

This work aims to combine the cloud robotics technologies with deep reinforcement learning to build a distributed training architecture and accelerate the learning procedure of autonomous systems. Especially, a distributed training architecture for navigating unmanned aerial vehicles (UAVs) in complicated dynamic environments is proposed.

Design/methodology/approach

This study proposes a distributed training architecture named experience-sharing learner-worker (ESLW) for deep reinforcement learning to navigate UAVs in dynamic environments, which is inspired by cloud-based techniques. With the ESLW architecture, multiple worker nodes operating in different environments can generate training data in parallel, and then the learner node trains a policy through the training data collected by the worker nodes. Besides, this study proposes an extended experience replay (EER) strategy to ensure the method can be applied to experience sequences to improve training efficiency. To learn more about dynamic environments, convolutional long short-term memory (ConvLSTM) modules are adopted to extract spatiotemporal information from training sequences.

Findings

Experimental results demonstrate that the ESLW architecture and the EER strategy accelerate the convergence speed and the ConvLSTM modules specialize in extract sequential information when navigating UAVs in dynamic environments.

Originality/value

Inspired by the cloud robotics technologies, this study proposes a distributed ESLW architecture for navigating UAVs in dynamic environments. Besides, the EER strategy is proposed to speed up training processes of experience sequences, and the ConvLSTM modules are added to networks to make full use of the sequential experiences.

Details

Assembly Automation, vol. 41 no. 3
Type: Research Article
ISSN: 0144-5154

Keywords

Article
Publication date: 18 December 2023

Volodymyr Novykov, Christopher Bilson, Adrian Gepp, Geoff Harris and Bruce James Vanstone

Machine learning (ML), and deep learning in particular, is gaining traction across a myriad of real-life applications. Portfolio management is no exception. This paper provides a…

Abstract

Purpose

Machine learning (ML), and deep learning in particular, is gaining traction across a myriad of real-life applications. Portfolio management is no exception. This paper provides a systematic literature review of deep learning applications for portfolio management. The findings are likely to be valuable for industry practitioners and researchers alike, experimenting with novel portfolio management approaches and furthering investment management practice.

Design/methodology/approach

This review follows the guidance and methodology of Linnenluecke et al. (2020), Massaro et al. (2016) and Fisch and Block (2018) to first identify relevant literature based on an appropriately developed search phrase, filter the resultant set of publications and present descriptive and analytical findings of the research itself and its metadata.

Findings

The authors find a strong dominance of reinforcement learning algorithms applied to the field, given their through-time portfolio management capabilities. Other well-known deep learning models, such as convolutional neural network (CNN) and recurrent neural network (RNN) and its derivatives, have shown to be well-suited for time-series forecasting. Most recently, the number of papers published in the field has been increasing, potentially driven by computational advances, hardware accessibility and data availability. The review shows several promising applications and identifies future research opportunities, including better balance on the risk-reward spectrum, novel ways to reduce data dimensionality and pre-process the inputs, stronger focus on direct weights generation, novel deep learning architectures and consistent data choices.

Originality/value

Several systematic reviews have been conducted with a broader focus of ML applications in finance. However, to the best of the authors’ knowledge, this is the first review to focus on deep learning architectures and their applications in the investment portfolio management problem. The review also presents a novel universal taxonomy of models used.

Details

Journal of Accounting Literature, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-4607

Keywords

Article
Publication date: 29 October 2020

Mu Shengdong, Wang Fengyu, Xiong Zhengxian, Zhuang Xiao and Zhang Lunfeng

With the advent of the web computing era, the transmission mode of the Internet of Everything has caused an explosion in data volume, which has brought severe challenges to…

Abstract

Purpose

With the advent of the web computing era, the transmission mode of the Internet of Everything has caused an explosion in data volume, which has brought severe challenges to traditional routing protocols. The limitations of the existing routing protocols under the condition of rapid data growth are elaborated, and the routing problem is remodeled as a Markov decision process. this paper aims to solve the problem of high blocking probability due to the increase in data volume by combining deep reinforcement learning. Finally, the correctness of the proposed algorithm in this paper is verified by simulation.

Design/methodology/approach

The limitations of the existing routing protocols under the condition of rapid data growth are elaborated and the routing problem is remodeled as a Markov decision process. Based on this, a deep reinforcement learning method is used to select the next-hop router for each data transmission task, thereby minimizing the length of the data transmission path while avoiding data congestion.

Findings

Simulation results show that the proposed method can significantly reduce the probability of data congestion and increase network throughput.

Originality/value

This paper proposes an intelligent routing algorithm for the network congestion caused by the explosive growth of data volume in the future of the big data era. With the help of deep reinforcement learning, it is possible to dynamically select the transmission jump router according to the current network state, thereby reducing the probability of congestion and improving network throughput.

Details

International Journal of Web Information Systems, vol. 16 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 16 January 2024

Ji Fang, Vincent C.S. Lee and Haiyan Wang

This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource…

Abstract

Purpose

This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource utilisation and deliver interactive health information service.

Design/methodology/approach

An adaptive optimal service resource management strategy was developed considering a value co-creation model in health information service with a focus on collaborative and interactive with users. The deep reinforcement learning algorithm was embedded in the Internet of Things (IoT)-based health information service system (I-HISS) to allocate service resources by controlling service provision and service adaptation based on user engagement behaviour. The simulation experiments were conducted to evaluate the significance of the proposed algorithm under different user reactions to the health information service.

Findings

The results indicate that the proposed service resource management strategy, considering user co-creation in the service delivery, process improved both the service provider’s business revenue and users' individual benefits.

Practical implications

The findings may facilitate the design and implementation of health information services that can achieve a high user service experience with low service operation costs.

Originality/value

This study is amongst the first to propose a service resource management model in I-HISS, considering the value co-creation of the user in the service-dominant logic. The novel artificial intelligence algorithm is developed using the deep reinforcement learning method to learn the adaptive service resource management strategy. The results emphasise user engagement in the health information service process.

Details

Industrial Management & Data Systems, vol. 124 no. 3
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 16 October 2018

Ke Xu, Fengge Wu and Junsuo Zhao

Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control…

Abstract

Purpose

Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control systems are still using classical control technics such as proportional – integral – derivative and slide mode control as major solutions, facing problems with adaptability and automation.

Design/methodology/approach

In this paper, an approach based on deep reinforcement learning is proposed to increase adaptability and autonomy of satellite control system. It is a model-based algorithm which could find solutions with fewer episodes of learning than model-free algorithms.

Findings

Simulation experiment shows that when classical control crashed, this approach could find solution and reach the target with hundreds times of explorations and learning.

Originality/value

This approach is a non-gradient method using heuristic search to optimize policy to avoid local optima. Compared with classical control technics, this approach does not need prior knowledge of satellite or its orbit, has the ability to adapt different kinds of situations with data learning and has the ability to adapt different kinds of satellite and different tasks through transfer learning.

Details

Industrial Robot: the international journal of robotics research and application, vol. 46 no. 3
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 23 August 2019

Minghui Zhao, Xian Guo, Xuebo Zhang, Yongchun Fang and Yongsheng Ou

This paper aims to automatically plan sequence for complex assembly products and improve assembly efficiency.

506

Abstract

Purpose

This paper aims to automatically plan sequence for complex assembly products and improve assembly efficiency.

Design/methodology/approach

An assembly sequence planning system for workpieces (ASPW) based on deep reinforcement learning is proposed in this paper. However, there exist enormous challenges for using DRL to this problem due to the sparse reward and the lack of training environment. In this paper, a novel ASPW-DQN algorithm is proposed and a training platform is built to overcome these challenges.

Findings

The system can get a good decision-making result and a generalized model suitable for other assembly problems. The experiments conducted in Gazebo show good results and great potential of this approach.

Originality/value

The proposed ASPW-DQN unites the curriculum learning and parameter transfer, which can avoid the explosive growth of assembly relations and improve system efficiency. It is combined with realistic physics simulation engine Gazebo to provide required training environment. Additionally with the effect of deep neural networks, the result can be easily applied to other similar tasks.

Details

Assembly Automation, vol. 40 no. 1
Type: Research Article
ISSN: 0144-5154

Keywords

Open Access
Article
Publication date: 18 July 2022

Youakim Badr

In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input…

1211

Abstract

Purpose

In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input features, considering scarce resources and constrains) that cannot be solved by classical machine learning. The authors include a comparative study to build intrusion detection based on statistical machine learning and representational learning, using knowledge discovery in databases (KDD) Cup99 and Installation Support Center of Expertise (ISCX) 2012.

Design/methodology/approach

The methodology applies a data analytics approach, consisting of data exploration and machine learning model training and evaluation. To build a network-based intrusion detection system, the authors apply dueling double deep Q-networks architecture enabled with costly features, k-nearest neighbors (K-NN), support-vector machines (SVM) and convolution neural networks (CNN).

Findings

Machine learning-based intrusion detection are trained on historical datasets which lead to model drift and lack of generalization whereas RL is trained with data collected through interactions. RL is bound to learn from its interactions with a stochastic environment in the absence of a training dataset whereas supervised learning simply learns from collected data and require less computational resources.

Research limitations/implications

All machine learning models have achieved high accuracy values and performance. One potential reason is that both datasets are simulated, and not realistic. It was not clear whether a validation was ever performed to show that data were collected from real network traffics.

Practical implications

The study provides guidelines to implement IDS with classical supervised learning, deep learning and RL.

Originality/value

The research applied the dueling double deep Q-networks architecture enabled with costly features to build network-based intrusion detection from network traffics. This research presents a comparative study of reinforcement-based instruction detection with counterparts built with statistical and representational machine learning.

Article
Publication date: 16 April 2020

Qiaoling Zhou

English original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies…

Abstract

Purpose

English original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies and reviews, this paper proposed an improved deep reinforcement learning algorithm for the recommendation of movies. In fact, although the conventional movies recommendation algorithms have solved the problem of information overload, they still have their limitations in the case of cold start-up and sparse data.

Design/methodology/approach

To solve the aforementioned problems of conventional movies recommendation algorithms, this paper proposed a recommendation algorithm based on the theory of deep reinforcement learning, which uses the deep deterministic policy gradient (DDPG) algorithm to solve the cold starting and sparse data problems and uses Item2vec to transform discrete action space into a continuous one. Meanwhile, a reward function combining with cosine distance and Euclidean distance is proposed to ensure that the neural network does not converge to local optimum prematurely.

Findings

In order to verify the feasibility and validity of the proposed algorithm, the state of the art and the proposed algorithm are compared in indexes of RMSE, recall rate and accuracy based on the MovieLens English original movie data set for the experiments. Experimental results have shown that the proposed algorithm is superior to the conventional algorithm in various indicators.

Originality/value

Applying the proposed algorithm to recommend English original movies, DDPG policy produces better recommendation results and alleviates the impact of cold start and sparse data.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 8 February 2021

Jiajun Xu, Linsen Xu, Gaoxin Cheng, Jia Shi, Jinfu Liu, Xingcan Liang and Shengyao Fan

This paper aims to propose a bilateral robotic system for lower extremity hemiparesis rehabilitation. The hemiplegic patients can complete rehabilitation exercise voluntarily with…

Abstract

Purpose

This paper aims to propose a bilateral robotic system for lower extremity hemiparesis rehabilitation. The hemiplegic patients can complete rehabilitation exercise voluntarily with the assistance of the robot. The reinforcement learning is included in the robot control system, enhancing the muscle activation of the impaired limbs (ILs) efficiently with ensuring the patients’ safety.

Design/methodology/approach

A bilateral leader–follower robotic system is constructed for lower extremity hemiparesis rehabilitation, where the leader robot interacts with the healthy limb (HL) and the follow robot is worn by the IL. The therapeutic training is transferred from the HL to the IL with the assistance of the robot, and the IL follows the motion trajectory prescribed by the HL, which is called the mirror therapy. The model reference adaptive impedance control is used for the leader robot, and the reinforcement learning controller is designed for the follower robot. The reinforcement learning aims to increase the muscle activation of the IL and ensure that its motion can be mastered by the HL for safety. An asynchronous algorithm is designed by improving experience relay to run in parallel on multiple robotic platforms to reduce learning time.

Findings

Through clinical tests, the lower extremity hemiplegic patients can rehabilitate with high efficiency using the robotic system. Also, the proposed scheme outperforms other state-of-the-art methods in tracking performance, muscle activation, learning efficiency and rehabilitation efficacy.

Originality/value

Using the aimed robotic system, the lower extremity hemiplegic patients with different movement abilities can obtain better rehabilitation efficacy.

Details

Industrial Robot: the international journal of robotics research and application, vol. 48 no. 3
Type: Research Article
ISSN: 0143-991X

Keywords

1 – 10 of over 4000