Search results

1 – 7 of 7

View access options

Article

Publication date: 15 April 2024

A memory and attention-based reinforcement learning for musculoskeletal robots with prior knowledge of muscle synergies

Limited by the types of sensors, the state information available for musculoskeletal robots with highly redundant, nonlinear muscles is often incomplete, which makes the control…

HTML

PDF (2.5 MB)

Downloads

Abstract

Purpose

Limited by the types of sensors, the state information available for musculoskeletal robots with highly redundant, nonlinear muscles is often incomplete, which makes the control face a bottleneck problem. The aim of this paper is to design a method to improve the motion performance of musculoskeletal robots in partially observable scenarios, and to leverage the ontology knowledge to enhance the algorithm’s adaptability to musculoskeletal robots that have undergone changes.

Design/methodology/approach

A memory and attention-based reinforcement learning method is proposed for musculoskeletal robots with prior knowledge of muscle synergies. First, to deal with partially observed states available to musculoskeletal robots, a memory and attention-based network architecture is proposed for inferring more sufficient and intrinsic states. Second, inspired by muscle synergy hypothesis in neuroscience, prior knowledge of a musculoskeletal robot’s muscle synergies is embedded in network structure and reward shaping.

Findings

Based on systematic validation, it is found that the proposed method demonstrates superiority over the traditional twin delayed deep deterministic policy gradients (TD3) algorithm. A musculoskeletal robot with highly redundant, nonlinear muscles is adopted to implement goal-directed tasks. In the case of 21-dimensional states, the learning efficiency and accuracy are significantly improved compared with the traditional TD3 algorithm; in the case of 13-dimensional states without velocities and information from the end effector, the traditional TD3 is unable to complete the reaching tasks, while the proposed method breaks through this bottleneck problem.

Originality/value

In this paper, a novel memory and attention-based reinforcement learning method with prior knowledge of muscle synergies is proposed for musculoskeletal robots to deal with partially observable scenarios. Compared with the existing methods, the proposed method effectively improves the performance. Furthermore, this paper promotes the fusion of neuroscience and robotics.

Details

Robotic Intelligence and Automation, vol. 44 no. 2

Type: Research Article

DOI:

ISSN: 2754-6969

Keywords

View access options

Article

Publication date: 19 March 2024

Web intelligence-enhanced unmanned aerial vehicle target search model based on reinforcement learning for cooperative tasks

Mingke Gao, Zhenyu Zhang, Jinyuan Zhang, Shihao Tang, Han Zhang and Tao Pang

Because of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and…

HTML

PDF (942 KB)

Downloads

Abstract

Purpose

Because of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and cooperative obstacle avoidance.

Design/methodology/approach

This study draws inspiration from the recurrent state-space model and recurrent models (RPM) to propose a simpler yet highly effective model called the unmanned aerial vehicles prediction model (UAVPM). The main objective is to assist in training the UAV representation model with a recurrent neural network, using the soft actor-critic algorithm.

Findings

This study proposes a generalized actor-critic framework consisting of three modules: representation, policy and value. This architecture serves as the foundation for training UAVPM. This study proposes the UAVPM, which is designed to aid in training the recurrent representation using the transition model, reward recovery model and observation recovery model. Unlike traditional approaches reliant solely on reward signals, RPM incorporates temporal information. In addition, it allows the inclusion of extra knowledge or information from virtual training environments. This study designs UAV target search and UAV cooperative obstacle avoidance tasks. The algorithm outperforms baselines in these two environments.

Originality/value

It is important to note that UAVPM does not play a role in the inference phase. This means that the representation model and policy remain independent of UAVPM. Consequently, this study can introduce additional “cheating” information from virtual training environments to guide the UAV representation without concerns about its real-world existence. By leveraging historical information more effectively, this study enhances UAVs’ decision-making abilities, thus improving the performance of both tasks at hand.

Details

International Journal of Web Information Systems, vol. 20 no. 3

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 16 January 2024

Optimal service resource management strategy for IoT-based health information system considering value co-creation of users

Ji Fang, Vincent C.S. Lee and Haiyan Wang

This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource…

HTML

PDF (2.1 MB)

Downloads

Abstract

Purpose

This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource utilisation and deliver interactive health information service.

Design/methodology/approach

An adaptive optimal service resource management strategy was developed considering a value co-creation model in health information service with a focus on collaborative and interactive with users. The deep reinforcement learning algorithm was embedded in the Internet of Things (IoT)-based health information service system (I-HISS) to allocate service resources by controlling service provision and service adaptation based on user engagement behaviour. The simulation experiments were conducted to evaluate the significance of the proposed algorithm under different user reactions to the health information service.

Findings

The results indicate that the proposed service resource management strategy, considering user co-creation in the service delivery, process improved both the service provider’s business revenue and users' individual benefits.

Practical implications

The findings may facilitate the design and implementation of health information services that can achieve a high user service experience with low service operation costs.

Originality/value

This study is amongst the first to propose a service resource management model in I-HISS, considering the value co-creation of the user in the service-dominant logic. The novel artificial intelligence algorithm is developed using the deep reinforcement learning method to learn the adaptive service resource management strategy. The results emphasise user engagement in the health information service process.

Details

Industrial Management & Data Systems, vol. 124 no. 3

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

View access options

Article

Publication date: 30 April 2024

An LSTM-based hybrid proximal policy optimization spectrum access algorithm in vehicular network

Lin Kang, Junjie Chen, Jie Wang and Yaqi Wei

In order to meet the different quality of service (QoS) requirements of vehicle-to-infrastructure (V2I) and multiple vehicle-to-vehicle (V2V) links in vehicle networks, an…

HTML

PDF (2.5 MB)

Downloads

Abstract

Purpose

In order to meet the different quality of service (QoS) requirements of vehicle-to-infrastructure (V2I) and multiple vehicle-to-vehicle (V2V) links in vehicle networks, an efficient V2V spectrum access mechanism is proposed in this paper.

Design/methodology/approach

A long-short-term-memory-based multi-agent hybrid proximal policy optimization (LSTM-H-PPO) algorithm is proposed, through which the distributed spectrum access and continuous power control of V2V link are realized.

Findings

Simulation results show that compared with the baseline algorithm, the proposed algorithm has significant advantages in terms of total system capacity, payload delivery success rate of V2V link and convergence speed.

Originality/value

The LSTM layer uses the time sequence information to estimate the accurate system state, which ensures the choice of V2V spectrum access based on local observation effective. The hybrid PPO framework shares training parameters among agents which speeds up the entire training process. The proposed algorithm adopts the mode of centralized training and distributed execution, so that the agent can achieve the optimal spectrum access based on local observation information with less signaling overhead.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 13 March 2024

Robot skill learning and the data dilemma it faces: a systematic review

Rong Jiang, Bin He, Zhipeng Wang, Xu Cheng, Hongrui Sang and Yanmin Zhou

Compared with traditional methods relying on manual teaching or system modeling, data-driven learning methods, such as deep reinforcement learning and imitation learning, show…

HTML

PDF (636 KB)

Downloads

Abstract

Purpose

Compared with traditional methods relying on manual teaching or system modeling, data-driven learning methods, such as deep reinforcement learning and imitation learning, show more promising potential to cope with the challenges brought by increasingly complex tasks and environments, which have become the hot research topic in the field of robot skill learning. However, the contradiction between the difficulty of collecting robot–environment interaction data and the low data efficiency causes all these methods to face a serious data dilemma, which has become one of the key issues restricting their development. Therefore, this paper aims to comprehensively sort out and analyze the cause and solutions for the data dilemma in robot skill learning.

Design/methodology/approach

First, this review analyzes the causes of the data dilemma based on the classification and comparison of data-driven methods for robot skill learning; Then, the existing methods used to solve the data dilemma are introduced in detail. Finally, this review discusses the remaining open challenges and promising research topics for solving the data dilemma in the future.

Findings

This review shows that simulation–reality combination, state representation learning and knowledge sharing are crucial for overcoming the data dilemma of robot skill learning.

Originality/value

To the best of the authors’ knowledge, there are no surveys that systematically and comprehensively sort out and analyze the data dilemma in robot skill learning in the existing literature. It is hoped that this review can be helpful to better address the data dilemma in robot skill learning in the future.

Details

Robotic Intelligence and Automation, vol. 44 no. 2

Type: Research Article

DOI:

ISSN: 2754-6969

Keywords

View access options

Article

Publication date: 1 April 2024

Web-enhanced unmanned aerial vehicle target search method combining imitation learning and reinforcement learning

Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu and Mingke Gao

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the…

HTML

PDF (886 KB)

Downloads

Abstract

Purpose

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.

Design/methodology/approach

Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.

Findings

The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.

Originality/value

The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.

Details

International Journal of Web Information Systems, vol. 20 no. 3

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 22 November 2023

Using digital twin to enhance Sim2real transfer for reinforcement learning in 3C assembly

Weiwen Mu, Wenbai Chen, Huaidong Zhou, Naijun Liu, Haobin Shi and Jingchen Li

This paper aim to solve the problem of low assembly success rate for 3c assembly lines designed based on classical control algorithms due to inevitable random disturbances and…

HTML

PDF (1.8 MB)

Downloads

125

Abstract

Purpose

This paper aim to solve the problem of low assembly success rate for 3c assembly lines designed based on classical control algorithms due to inevitable random disturbances and other factors,by incorporating intelligent algorithms into the assembly line, the assembly process can be extended to uncertain assembly scenarios.

Design/methodology/approach

This work proposes a reinforcement learning framework based on digital twins. First, the authors used Unity3D to build a simulation environment that matches the real scene and achieved data synchronization between the real environment and the simulation environment through the robot operating system. Then, the authors trained the reinforcement learning model in the simulation environment. Finally, by creating a digital twin environment, the authors transferred the skill learned from the simulation to the real environment and achieved stable algorithm deployment in real-world scenarios.

Findings

In this work, the authors have completed the transfer of skill-learning algorithms from virtual to real environments by establishing a digital twin environment. On the one hand, the experiment proves the progressiveness of the algorithm and the feasibility of the application of digital twins in reinforcement learning transfer. On the other hand, the experimental results also provide reference for the application of digital twins in 3C assembly scenarios.

Originality/value

In this work, the authors designed a new encoder structure in the simulation environment to encode image information, which improved the model’s perception of the environment. At the same time, the authors used the fixed strategy combined with the reinforcement learning strategy to learn skills, which improved the rate of convergence and stability of skills learning. Finally, the authors transferred the learned skills to the physical platform through digital twin technology and realized the safe operation of the flexible printed circuit assembly task.

Details

Industrial Robot: the international journal of robotics research and application, vol. 51 no. 1

Type: Research Article

DOI:

ISSN: 0143-991X

Keywords

Access

Year

Content type

1 – 7 of 7

A memory and attention-based reinforcement learning for musculoskeletal robots with prior knowledge of muscle synergies

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Web intelligence-enhanced unmanned aerial vehicle target search model based on reinforcement learning for cooperative tasks

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Optimal service resource management strategy for IoT-based health information system considering value co-creation of users

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

An LSTM-based hybrid proximal policy optimization spectrum access algorithm in vehicular network

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Robot skill learning and the data dilemma it faces: a systematic review

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Web-enhanced unmanned aerial vehicle target search method combining imitation learning and reinforcement learning

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Using digital twin to enhance Sim2real transfer for reinforcement learning in 3C assembly

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information