Search results
1 – 7 of 7Xiaona Wang, Jiahao Chen and Hong Qiao
Limited by the types of sensors, the state information available for musculoskeletal robots with highly redundant, nonlinear muscles is often incomplete, which makes the control…
Abstract
Purpose
Limited by the types of sensors, the state information available for musculoskeletal robots with highly redundant, nonlinear muscles is often incomplete, which makes the control face a bottleneck problem. The aim of this paper is to design a method to improve the motion performance of musculoskeletal robots in partially observable scenarios, and to leverage the ontology knowledge to enhance the algorithm’s adaptability to musculoskeletal robots that have undergone changes.
Design/methodology/approach
A memory and attention-based reinforcement learning method is proposed for musculoskeletal robots with prior knowledge of muscle synergies. First, to deal with partially observed states available to musculoskeletal robots, a memory and attention-based network architecture is proposed for inferring more sufficient and intrinsic states. Second, inspired by muscle synergy hypothesis in neuroscience, prior knowledge of a musculoskeletal robot’s muscle synergies is embedded in network structure and reward shaping.
Findings
Based on systematic validation, it is found that the proposed method demonstrates superiority over the traditional twin delayed deep deterministic policy gradients (TD3) algorithm. A musculoskeletal robot with highly redundant, nonlinear muscles is adopted to implement goal-directed tasks. In the case of 21-dimensional states, the learning efficiency and accuracy are significantly improved compared with the traditional TD3 algorithm; in the case of 13-dimensional states without velocities and information from the end effector, the traditional TD3 is unable to complete the reaching tasks, while the proposed method breaks through this bottleneck problem.
Originality/value
In this paper, a novel memory and attention-based reinforcement learning method with prior knowledge of muscle synergies is proposed for musculoskeletal robots to deal with partially observable scenarios. Compared with the existing methods, the proposed method effectively improves the performance. Furthermore, this paper promotes the fusion of neuroscience and robotics.
Details
Keywords
Mingke Gao, Zhenyu Zhang, Jinyuan Zhang, Shihao Tang, Han Zhang and Tao Pang
Because of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and…
Abstract
Purpose
Because of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and cooperative obstacle avoidance.
Design/methodology/approach
This study draws inspiration from the recurrent state-space model and recurrent models (RPM) to propose a simpler yet highly effective model called the unmanned aerial vehicles prediction model (UAVPM). The main objective is to assist in training the UAV representation model with a recurrent neural network, using the soft actor-critic algorithm.
Findings
This study proposes a generalized actor-critic framework consisting of three modules: representation, policy and value. This architecture serves as the foundation for training UAVPM. This study proposes the UAVPM, which is designed to aid in training the recurrent representation using the transition model, reward recovery model and observation recovery model. Unlike traditional approaches reliant solely on reward signals, RPM incorporates temporal information. In addition, it allows the inclusion of extra knowledge or information from virtual training environments. This study designs UAV target search and UAV cooperative obstacle avoidance tasks. The algorithm outperforms baselines in these two environments.
Originality/value
It is important to note that UAVPM does not play a role in the inference phase. This means that the representation model and policy remain independent of UAVPM. Consequently, this study can introduce additional “cheating” information from virtual training environments to guide the UAV representation without concerns about its real-world existence. By leveraging historical information more effectively, this study enhances UAVs’ decision-making abilities, thus improving the performance of both tasks at hand.
Details
Keywords
Ji Fang, Vincent C.S. Lee and Haiyan Wang
This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource…
Abstract
Purpose
This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource utilisation and deliver interactive health information service.
Design/methodology/approach
An adaptive optimal service resource management strategy was developed considering a value co-creation model in health information service with a focus on collaborative and interactive with users. The deep reinforcement learning algorithm was embedded in the Internet of Things (IoT)-based health information service system (I-HISS) to allocate service resources by controlling service provision and service adaptation based on user engagement behaviour. The simulation experiments were conducted to evaluate the significance of the proposed algorithm under different user reactions to the health information service.
Findings
The results indicate that the proposed service resource management strategy, considering user co-creation in the service delivery, process improved both the service provider’s business revenue and users' individual benefits.
Practical implications
The findings may facilitate the design and implementation of health information services that can achieve a high user service experience with low service operation costs.
Originality/value
This study is amongst the first to propose a service resource management model in I-HISS, considering the value co-creation of the user in the service-dominant logic. The novel artificial intelligence algorithm is developed using the deep reinforcement learning method to learn the adaptive service resource management strategy. The results emphasise user engagement in the health information service process.
Details
Keywords
Lin Kang, Junjie Chen, Jie Wang and Yaqi Wei
In order to meet the different quality of service (QoS) requirements of vehicle-to-infrastructure (V2I) and multiple vehicle-to-vehicle (V2V) links in vehicle networks, an…
Abstract
Purpose
In order to meet the different quality of service (QoS) requirements of vehicle-to-infrastructure (V2I) and multiple vehicle-to-vehicle (V2V) links in vehicle networks, an efficient V2V spectrum access mechanism is proposed in this paper.
Design/methodology/approach
A long-short-term-memory-based multi-agent hybrid proximal policy optimization (LSTM-H-PPO) algorithm is proposed, through which the distributed spectrum access and continuous power control of V2V link are realized.
Findings
Simulation results show that compared with the baseline algorithm, the proposed algorithm has significant advantages in terms of total system capacity, payload delivery success rate of V2V link and convergence speed.
Originality/value
The LSTM layer uses the time sequence information to estimate the accurate system state, which ensures the choice of V2V spectrum access based on local observation effective. The hybrid PPO framework shares training parameters among agents which speeds up the entire training process. The proposed algorithm adopts the mode of centralized training and distributed execution, so that the agent can achieve the optimal spectrum access based on local observation information with less signaling overhead.
Details
Keywords
Rong Jiang, Bin He, Zhipeng Wang, Xu Cheng, Hongrui Sang and Yanmin Zhou
Compared with traditional methods relying on manual teaching or system modeling, data-driven learning methods, such as deep reinforcement learning and imitation learning, show…
Abstract
Purpose
Compared with traditional methods relying on manual teaching or system modeling, data-driven learning methods, such as deep reinforcement learning and imitation learning, show more promising potential to cope with the challenges brought by increasingly complex tasks and environments, which have become the hot research topic in the field of robot skill learning. However, the contradiction between the difficulty of collecting robot–environment interaction data and the low data efficiency causes all these methods to face a serious data dilemma, which has become one of the key issues restricting their development. Therefore, this paper aims to comprehensively sort out and analyze the cause and solutions for the data dilemma in robot skill learning.
Design/methodology/approach
First, this review analyzes the causes of the data dilemma based on the classification and comparison of data-driven methods for robot skill learning; Then, the existing methods used to solve the data dilemma are introduced in detail. Finally, this review discusses the remaining open challenges and promising research topics for solving the data dilemma in the future.
Findings
This review shows that simulation–reality combination, state representation learning and knowledge sharing are crucial for overcoming the data dilemma of robot skill learning.
Originality/value
To the best of the authors’ knowledge, there are no surveys that systematically and comprehensively sort out and analyze the data dilemma in robot skill learning in the existing literature. It is hoped that this review can be helpful to better address the data dilemma in robot skill learning in the future.
Details
Keywords
Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu and Mingke Gao
This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the…
Abstract
Purpose
This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.
Design/methodology/approach
Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.
Findings
The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.
Originality/value
The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.
Details
Keywords
Weiwen Mu, Wenbai Chen, Huaidong Zhou, Naijun Liu, Haobin Shi and Jingchen Li
This paper aim to solve the problem of low assembly success rate for 3c assembly lines designed based on classical control algorithms due to inevitable random disturbances and…
Abstract
Purpose
This paper aim to solve the problem of low assembly success rate for 3c assembly lines designed based on classical control algorithms due to inevitable random disturbances and other factors,by incorporating intelligent algorithms into the assembly line, the assembly process can be extended to uncertain assembly scenarios.
Design/methodology/approach
This work proposes a reinforcement learning framework based on digital twins. First, the authors used Unity3D to build a simulation environment that matches the real scene and achieved data synchronization between the real environment and the simulation environment through the robot operating system. Then, the authors trained the reinforcement learning model in the simulation environment. Finally, by creating a digital twin environment, the authors transferred the skill learned from the simulation to the real environment and achieved stable algorithm deployment in real-world scenarios.
Findings
In this work, the authors have completed the transfer of skill-learning algorithms from virtual to real environments by establishing a digital twin environment. On the one hand, the experiment proves the progressiveness of the algorithm and the feasibility of the application of digital twins in reinforcement learning transfer. On the other hand, the experimental results also provide reference for the application of digital twins in 3C assembly scenarios.
Originality/value
In this work, the authors designed a new encoder structure in the simulation environment to encode image information, which improved the model’s perception of the environment. At the same time, the authors used the fixed strategy combined with the reinforcement learning strategy to learn skills, which improved the rate of convergence and stability of skills learning. Finally, the authors transferred the learned skills to the physical platform through digital twin technology and realized the safe operation of the flexible printed circuit assembly task.
Details