Search results

1 – 10 of over 1000
Article
Publication date: 8 May 2024

Hongze Wang

Many practical control problems require achieving multiple objectives, and these objectives often conflict with each other. The existing multi-objective evolutionary reinforcement…

Abstract

Purpose

Many practical control problems require achieving multiple objectives, and these objectives often conflict with each other. The existing multi-objective evolutionary reinforcement learning algorithms cannot achieve good search results when solving such problems. It is necessary to design a new multi-objective evolutionary reinforcement learning algorithm with a stronger searchability.

Design/methodology/approach

The multi-objective reinforcement learning algorithm proposed in this paper is based on the evolutionary computation framework. In each generation, this study uses the long-short-term selection method to select parent policies. The long-term selection is based on the improvement of policy along the predefined optimization direction in the previous generation. The short-term selection uses a prediction model to predict the optimization direction that may have the greatest improvement on overall population performance. In the evolutionary stage, the penalty-based nonlinear scalarization method is used to scalarize the multi-dimensional advantage functions, and the nonlinear multi-objective policy gradient is designed to optimize the parent policies along the predefined directions.

Findings

The penalty-based nonlinear scalarization method can force policies to improve along the predefined optimization directions. The long-short-term optimization method can alleviate the exploration-exploitation problem, enabling the algorithm to explore unknown regions while ensuring that potential policies are fully optimized. The combination of these designs can effectively improve the performance of the final population.

Originality/value

A multi-objective evolutionary reinforcement learning algorithm with stronger searchability has been proposed. This algorithm can find a Pareto policy set with better convergence, diversity and density.

Details

Robotic Intelligence and Automation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2754-6969

Keywords

Article
Publication date: 2 March 2012

Hamed Shahbazi, Kamal Jamshidi and Amir Hasan Monadjemi

The purpose of this paper is to model a motor region named the mesencephalic locomotors region (MLR) which is located in the end part of the brain and first part of the spinal…

Abstract

Purpose

The purpose of this paper is to model a motor region named the mesencephalic locomotors region (MLR) which is located in the end part of the brain and first part of the spinal cord. This model will be used for a Nao soccer player humanoid robot. It consists of three main parts: High Level Decision Unit (HLDU), MLR‐Learner and the CPG layer. The authors focus on a special type of decision making named curvilinear walking.

Design/methodology/approach

The authors' model is based on stimulation of some programmable central pattern generators (PCPGs) to generate curvilinear bipedal walking patterns. PCPGs are made from adaptive Hopfs oscillators. High level decision, i.e. curvilinear bipedal walking, will be formulated as a policy gradient learning problem over some free parameters of the robot CPG controller.

Findings

The paper provides a basic model for generating different types of motions in humanoid robots using only simple stimulation of a CPG layer. A suitable and fast curvilinear walk has been achieved on a Nao humanoid robot, which is similar to human ordinary walking. This model can be extended and used in other types of humanoid.

Research limitations/implications

The authors' work is limited to a special type of biped locomotion. Different types of other motions are encouraged to be tested and evaluated by this model.

Practical implications

The paper introduces a bio‐inspired model of skill learning for humanoid robots. It is used for curvilinear bipedal walking pattern, which is a beneficial movement in soccer‐playing Nao robots in Robocup competitions.

Originality/value

The paper uses a new biological motor concept in artificial humanoid robots, which is the mesencephalic locomotor region.

Details

Industrial Robot: An International Journal, vol. 39 no. 2
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 16 April 2020

Qiaoling Zhou

English original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies…

Abstract

Purpose

English original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies and reviews, this paper proposed an improved deep reinforcement learning algorithm for the recommendation of movies. In fact, although the conventional movies recommendation algorithms have solved the problem of information overload, they still have their limitations in the case of cold start-up and sparse data.

Design/methodology/approach

To solve the aforementioned problems of conventional movies recommendation algorithms, this paper proposed a recommendation algorithm based on the theory of deep reinforcement learning, which uses the deep deterministic policy gradient (DDPG) algorithm to solve the cold starting and sparse data problems and uses Item2vec to transform discrete action space into a continuous one. Meanwhile, a reward function combining with cosine distance and Euclidean distance is proposed to ensure that the neural network does not converge to local optimum prematurely.

Findings

In order to verify the feasibility and validity of the proposed algorithm, the state of the art and the proposed algorithm are compared in indexes of RMSE, recall rate and accuracy based on the MovieLens English original movie data set for the experiments. Experimental results have shown that the proposed algorithm is superior to the conventional algorithm in various indicators.

Originality/value

Applying the proposed algorithm to recommend English original movies, DDPG policy produces better recommendation results and alleviates the impact of cold start and sparse data.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 19 November 2021

Yanbiao Zou and Hengchang Zhou

This paper aims to propose a weld seam tracking method based on proximal policy optimization (PPO).

Abstract

Purpose

This paper aims to propose a weld seam tracking method based on proximal policy optimization (PPO).

Design/methodology/approach

By constructing a neural network based on PPO and using the reference image block and the image block to be detected as the dual-channel input of the network, the method predicts the translation relation between the two images and corrects the location of feature points in the weld image. The localization accuracy estimation network (LAE-Net) is built to update the reference image block during the welding process, which is helpful to reduce the tracking error.

Findings

Off-line simulation results show that the proposed algorithm has strong robustness and performs well on the test set of curved seam images with strong noise. In the welding experiment, the movement of welding torch is stable, the molten material is uniform and smooth and the welding error is small, which can meet the requirements of industrial production.

Originality/value

The idea of image registration is applied to weld seam tracking, and the weld seam tracking network is built on the basis of PPO. In order to further improve the tracking accuracy, the LAE-Net is constructed and the reference images can be updated.

Details

Industrial Robot: the international journal of robotics research and application, vol. 49 no. 4
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 15 May 2017

Hongbo Zhu, Minzhou Luo, Jianghai Zhao and Tao Li

The purpose of this paper was to present a soft landing control strategy for a biped robot to avoid and absorb the impulsive reaction forces (which weakens walking stability…

Abstract

Purpose

The purpose of this paper was to present a soft landing control strategy for a biped robot to avoid and absorb the impulsive reaction forces (which weakens walking stability) caused by the landing impact between the swing foot and the ground.

Design/methodology/approach

First, a suitable trajectory of the swing foot is preplanned to avoid the impulsive reaction forces in the walking direction. Second, the impulsive reaction forces of the landing impact are suppressed by the on-line trajectory modification based on the extended time-domain passivity control with admittance causality that has the reaction forces as inputs and the decomposed swing foot’s positions to trim off the forces as the outputs.

Findings

The experiment data and results are described and analyzed, showing that the proposed soft landing control strategy can suppress the impulsive forces and improve walking stability.

Originality/value

The main contribution is that a soft landing control strategy for a biped robot was proposed to deal with the impulsive reaction forces generated by the landing impact, which enhances walking stability.

Details

Industrial Robot: An International Journal, vol. 44 no. 3
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 17 October 2008

Lei Yang, James Dankert and Jennie Si

The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation…

Abstract

Purpose

The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation based on the concepts of differential costs and performance gradients. Under such a framework, a modified value iteration algorithm is developed that is easy to implement, in the mean time it can address a class of partially observable Markov decision processes (POMDP).

Design/methodology/approach

Gradient‐based policy iteration (GBPI) is a top‐down, system‐theoretic approach to dynamic optimization with performance guarantees. In this paper, a bottom‐up, algorithmic view is provided to complement the original high‐level development of GBPI. A modified value iteration is introduced, which can provide solutions to the same type of POMDP problems dealt with by GBPI. Numerical simulations are conducted to include a queuing problem and a maze problem to illustrate and verify features of the proposed algorithms as compared to GBPI.

Findings

The direct connection between GBPI and policy iteration is shown under a Markov decision process formulation. As such, additional analytical insights were gained on GBPI. Furthermore, motivated by this analytical framework, the authors propose a modified value iteration as an alternative to addressing the same POMDP problem handled by GBPI.

Originality/value

Several important insights are gained from the analytical framework, which motivate the development of both algorithms. Built on this paradigm, new ADP learning algorithms can be developed, in this case, the modified value iteration, to address a broader class of problems, the POMDP. In addition, it is now possible to provide ADP algorithms with a gradient perspective. Inspired by the fundamental understanding of learning and optimization problems under the gradient‐based framework, additional new insight may be developed for bottom‐up type of algorithms with performance guarantees.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 1 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 27 July 2021

Xinwang Li, Juliang Xiao, Wei Zhao, Haitao Liu and Guodong Wang

As complex analysis of contact models is required in the traditional assembly strategy, it is still a challenge for a robot to complete the multiple peg-in-hole assembly tasks…

Abstract

Purpose

As complex analysis of contact models is required in the traditional assembly strategy, it is still a challenge for a robot to complete the multiple peg-in-hole assembly tasks autonomously. This paper aims to enable the robot to complete the assembly tasks autonomously and more efficiently, with the strategies learned by reinforcement learning (RL), a learning-accelerated deep deterministic policy gradient (LADDPG) algorithm is proposed.

Design/methodology/approach

The multiple peg-in-hole assembly strategy is designed in two modules: an advanced planning module and a bottom control module. The advanced module is completed by the LADDPG agent, which is used to derive advanced commands based on geometric and environmental constraints, that is, the desired contact force. The bottom-level control module will drive the robot to complete the compliant assembly task through the adaptive impedance algorithm according to the command set issued by the advanced module. In addition, a set of safety assurance mechanisms is developed to safely train a collaborative robot to complete autonomous learning.

Findings

The method can complete the assembly tasks well through RL, and it can realize satisfactory compliance of the robot to the environment. Compared with the original DDPG algorithm, the average values of the instantaneous maximum contact force and contact torque during the assembly process are reduced by approximately 38% and 74%, respectively.

Practical implications

The entire algorithm can also be applied to other robots and the assembly strategy can be applied in the field of the automatic assembly.

Originality/value

A compliant assembly strategy based on the LADDPG algorithm is proposed to complete the automated multiple peg-in-hole assembly tasks.

Details

Industrial Robot: the international journal of robotics research and application, vol. 49 no. 1
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 16 October 2018

Ke Xu, Fengge Wu and Junsuo Zhao

Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control…

Abstract

Purpose

Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control systems are still using classical control technics such as proportional – integral – derivative and slide mode control as major solutions, facing problems with adaptability and automation.

Design/methodology/approach

In this paper, an approach based on deep reinforcement learning is proposed to increase adaptability and autonomy of satellite control system. It is a model-based algorithm which could find solutions with fewer episodes of learning than model-free algorithms.

Findings

Simulation experiment shows that when classical control crashed, this approach could find solution and reach the target with hundreds times of explorations and learning.

Originality/value

This approach is a non-gradient method using heuristic search to optimize policy to avoid local optima. Compared with classical control technics, this approach does not need prior knowledge of satellite or its orbit, has the ability to adapt different kinds of situations with data learning and has the ability to adapt different kinds of satellite and different tasks through transfer learning.

Details

Industrial Robot: the international journal of robotics research and application, vol. 46 no. 3
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 1 April 2024

Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu and Mingke Gao

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the…

Abstract

Purpose

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.

Design/methodology/approach

Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.

Findings

The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.

Originality/value

The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.

Details

International Journal of Web Information Systems, vol. 20 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Book part
Publication date: 13 March 2023

Xiao Liu

The expansion of marketing data is encouraging the growing use of deep learning (DL) in marketing. I summarize the intuition behind deep learning and explain the mechanisms of six…

Abstract

The expansion of marketing data is encouraging the growing use of deep learning (DL) in marketing. I summarize the intuition behind deep learning and explain the mechanisms of six popular algorithms: three discriminative (convolutional neural network (CNN), recurrent neural network (RNN), and Transformer), two generative (variational autoencoder (VAE) and generative adversarial networks (GAN)), and one RL (DQN). I discuss what marketing problems DL is useful for and what fueled its growth in recent years. I emphasize the power and flexibility of DL for modeling unstructured data when formal theories and knowledge are absent. I also describe future research directions.

1 – 10 of over 1000