Search results

1 – 2 of 2
Article
Publication date: 19 March 2024

Mingke Gao, Zhenyu Zhang, Jinyuan Zhang, Shihao Tang, Han Zhang and Tao Pang

Because of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and…

Abstract

Purpose

Because of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and cooperative obstacle avoidance.

Design/methodology/approach

This study draws inspiration from the recurrent state-space model and recurrent models (RPM) to propose a simpler yet highly effective model called the unmanned aerial vehicles prediction model (UAVPM). The main objective is to assist in training the UAV representation model with a recurrent neural network, using the soft actor-critic algorithm.

Findings

This study proposes a generalized actor-critic framework consisting of three modules: representation, policy and value. This architecture serves as the foundation for training UAVPM. This study proposes the UAVPM, which is designed to aid in training the recurrent representation using the transition model, reward recovery model and observation recovery model. Unlike traditional approaches reliant solely on reward signals, RPM incorporates temporal information. In addition, it allows the inclusion of extra knowledge or information from virtual training environments. This study designs UAV target search and UAV cooperative obstacle avoidance tasks. The algorithm outperforms baselines in these two environments.

Originality/value

It is important to note that UAVPM does not play a role in the inference phase. This means that the representation model and policy remain independent of UAVPM. Consequently, this study can introduce additional “cheating” information from virtual training environments to guide the UAV representation without concerns about its real-world existence. By leveraging historical information more effectively, this study enhances UAVs’ decision-making abilities, thus improving the performance of both tasks at hand.

Details

International Journal of Web Information Systems, vol. 20 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 April 2024

Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu and Mingke Gao

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the…

Abstract

Purpose

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.

Design/methodology/approach

Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.

Findings

The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.

Originality/value

The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.

Details

International Journal of Web Information Systems, vol. 20 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 2 of 2