Search results

1 – 10 of over 1000

View access options

Article

Publication date: 8 May 2024

Multi-objective reinforcement learning based on nonlinear scalarization and long-short-term optimization

Many practical control problems require achieving multiple objectives, and these objectives often conflict with each other. The existing multi-objective evolutionary reinforcement…

HTML

PDF (2.2 MB)

Downloads

Abstract

Purpose

Many practical control problems require achieving multiple objectives, and these objectives often conflict with each other. The existing multi-objective evolutionary reinforcement learning algorithms cannot achieve good search results when solving such problems. It is necessary to design a new multi-objective evolutionary reinforcement learning algorithm with a stronger searchability.

Design/methodology/approach

The multi-objective reinforcement learning algorithm proposed in this paper is based on the evolutionary computation framework. In each generation, this study uses the long-short-term selection method to select parent policies. The long-term selection is based on the improvement of policy along the predefined optimization direction in the previous generation. The short-term selection uses a prediction model to predict the optimization direction that may have the greatest improvement on overall population performance. In the evolutionary stage, the penalty-based nonlinear scalarization method is used to scalarize the multi-dimensional advantage functions, and the nonlinear multi-objective policy gradient is designed to optimize the parent policies along the predefined directions.

Findings

The penalty-based nonlinear scalarization method can force policies to improve along the predefined optimization directions. The long-short-term optimization method can alleviate the exploration-exploitation problem, enabling the algorithm to explore unknown regions while ensuring that potential policies are fully optimized. The combination of these designs can effectively improve the performance of the final population.

Originality/value

A multi-objective evolutionary reinforcement learning algorithm with stronger searchability has been proposed. This algorithm can find a Pareto policy set with better convergence, diversity and density.

Details

Robotic Intelligence and Automation, vol. 44 no. 3

Type: Research Article

DOI:

ISSN: 2754-6969

Keywords

View access options

Article

Publication date: 2 March 2012

Modeling of mesencephalic locomotor region for Nao humanoid robot

Hamed Shahbazi, Kamal Jamshidi and Amir Hasan Monadjemi

The purpose of this paper is to model a motor region named the mesencephalic locomotors region (MLR) which is located in the end part of the brain and first part of the spinal…

HTML

PDF (581 KB)

Downloads

370

Abstract

Purpose

The purpose of this paper is to model a motor region named the mesencephalic locomotors region (MLR) which is located in the end part of the brain and first part of the spinal cord. This model will be used for a Nao soccer player humanoid robot. It consists of three main parts: High Level Decision Unit (HLDU), MLR‐Learner and the CPG layer. The authors focus on a special type of decision making named curvilinear walking.

Design/methodology/approach

The authors' model is based on stimulation of some programmable central pattern generators (PCPGs) to generate curvilinear bipedal walking patterns. PCPGs are made from adaptive Hopfs oscillators. High level decision, i.e. curvilinear bipedal walking, will be formulated as a policy gradient learning problem over some free parameters of the robot CPG controller.

Findings

The paper provides a basic model for generating different types of motions in humanoid robots using only simple stimulation of a CPG layer. A suitable and fast curvilinear walk has been achieved on a Nao humanoid robot, which is similar to human ordinary walking. This model can be extended and used in other types of humanoid.

Research limitations/implications

The authors' work is limited to a special type of biped locomotion. Different types of other motions are encouraged to be tested and evaluated by this model.

Practical implications

The paper introduces a bio‐inspired model of skill learning for humanoid robots. It is used for curvilinear bipedal walking pattern, which is a beneficial movement in soccer‐playing Nao robots in Robocup competitions.

Originality/value

The paper uses a new biological motor concept in artificial humanoid robots, which is the mesencephalic locomotor region.

Details

Industrial Robot: An International Journal, vol. 39 no. 2

Type: Research Article

DOI:

ISSN: 0143-991X

Keywords

View access options

Article

Publication date: 16 April 2020

A novel movies recommendation algorithm based on reinforcement learning with DDPG policy

Qiaoling Zhou

English original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies…

HTML

PDF (904 KB)

Downloads

352

Abstract

Purpose

English original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies and reviews, this paper proposed an improved deep reinforcement learning algorithm for the recommendation of movies. In fact, although the conventional movies recommendation algorithms have solved the problem of information overload, they still have their limitations in the case of cold start-up and sparse data.

Design/methodology/approach

To solve the aforementioned problems of conventional movies recommendation algorithms, this paper proposed a recommendation algorithm based on the theory of deep reinforcement learning, which uses the deep deterministic policy gradient (DDPG) algorithm to solve the cold starting and sparse data problems and uses Item2vec to transform discrete action space into a continuous one. Meanwhile, a reward function combining with cosine distance and Euclidean distance is proposed to ensure that the neural network does not converge to local optimum prematurely.

Findings

In order to verify the feasibility and validity of the proposed algorithm, the state of the art and the proposed algorithm are compared in indexes of RMSE, recall rate and accuracy based on the MovieLens English original movie data set for the experiments. Experimental results have shown that the proposed algorithm is superior to the conventional algorithm in various indicators.

Originality/value

Applying the proposed algorithm to recommend English original movies, DDPG policy produces better recommendation results and alleviates the impact of cold start and sparse data.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 1

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 19 November 2021

Laser vision seam tracking system based on proximal policy optimization

Yanbiao Zou and Hengchang Zhou

This paper aims to propose a weld seam tracking method based on proximal policy optimization (PPO).

HTML

PDF (2.1 MB)

Downloads

156

Abstract

Purpose

This paper aims to propose a weld seam tracking method based on proximal policy optimization (PPO).

Design/methodology/approach

By constructing a neural network based on PPO and using the reference image block and the image block to be detected as the dual-channel input of the network, the method predicts the translation relation between the two images and corrects the location of feature points in the weld image. The localization accuracy estimation network (LAE-Net) is built to update the reference image block during the welding process, which is helpful to reduce the tracking error.

Findings

Off-line simulation results show that the proposed algorithm has strong robustness and performs well on the test set of curved seam images with strong noise. In the welding experiment, the movement of welding torch is stable, the molten material is uniform and smooth and the welding error is small, which can meet the requirements of industrial production.

Originality/value

The idea of image registration is applied to weld seam tracking, and the weld seam tracking network is built on the basis of PPO. In order to further improve the tracking accuracy, the LAE-Net is constructed and the reference images can be updated.

Details

Industrial Robot: the international journal of robotics research and application, vol. 49 no. 4

Type: Research Article

DOI:

ISSN: 0143-991X

Keywords

View access options

Article

Publication date: 19 July 2024

Topology-preserved distorted space path planning

Yangmin Xie, Qiaoni Yang, Rui Zhou, Zhiyan Cao and Hang Shi

Fast obstacle avoidance path planning is a challenging task for multijoint robots navigating through cluttered workspaces. This paper aims to address this issue by proposing an…

HTML

PDF (3 MB)

Downloads

Abstract

Purpose

Fast obstacle avoidance path planning is a challenging task for multijoint robots navigating through cluttered workspaces. This paper aims to address this issue by proposing an improved path-planning method based on the distorted space (DS) method, specifically designed for high-dimensional complex environments.

Design/methodology/approach

The proposed method, termed topology-preserved distorted space (TP-DS) method, mitigates the limitations of the original DS method by preserving space topology through elastic deformation. By applying distinct spring constants, the TP-DS autonomously shrinks obstacles to microscopic areas within the configuration space, maintaining consistent topology. This enhancement extends the application scope of the DS method to handle complex environments effectively.

Findings

Comparative analysis demonstrates that the proposed TP-DS method outperforms traditional methods regarding planning efficiency. Successful obstacle avoidance tasks in the cluttered workspace validate its applicability on a physical 6-DOF manipulator, highlighting its potential for industrial implementations.

Originality/value

The novel TP-DS method generates a topology-preserved collision-free space by leveraging elastic deformation and shows significant capability and efficiency in planning obstacle-avoidance paths in complex application scenarios.

Details

Industrial Robot: the international journal of robotics research and application, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0143-991X

Keywords

View access options

Article

Publication date: 15 May 2017

Soft landing control strategy for biped robot

Hongbo Zhu, Minzhou Luo, Jianghai Zhao and Tao Li

The purpose of this paper was to present a soft landing control strategy for a biped robot to avoid and absorb the impulsive reaction forces (which weakens walking stability…

HTML

PDF (3.7 MB)

Downloads

281

Abstract

Purpose

The purpose of this paper was to present a soft landing control strategy for a biped robot to avoid and absorb the impulsive reaction forces (which weakens walking stability) caused by the landing impact between the swing foot and the ground.

Design/methodology/approach

First, a suitable trajectory of the swing foot is preplanned to avoid the impulsive reaction forces in the walking direction. Second, the impulsive reaction forces of the landing impact are suppressed by the on-line trajectory modification based on the extended time-domain passivity control with admittance causality that has the reaction forces as inputs and the decomposed swing foot’s positions to trim off the forces as the outputs.

Findings

The experiment data and results are described and analyzed, showing that the proposed soft landing control strategy can suppress the impulsive forces and improve walking stability.

Originality/value

The main contribution is that a soft landing control strategy for a biped robot was proposed to deal with the impulsive reaction forces generated by the landing impact, which enhances walking stability.

Details

Industrial Robot: An International Journal, vol. 44 no. 3

Type: Research Article

DOI:

ISSN: 0143-991X

Keywords

View access options

Article

Publication date: 17 October 2008

A performance gradient perspective on gradient‐based policy iteration and a modified value iteration

Lei Yang, James Dankert and Jennie Si

The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation…

HTML

PDF (139 KB)

Downloads

247

Abstract

Purpose

The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation based on the concepts of differential costs and performance gradients. Under such a framework, a modified value iteration algorithm is developed that is easy to implement, in the mean time it can address a class of partially observable Markov decision processes (POMDP).

Design/methodology/approach

Gradient‐based policy iteration (GBPI) is a top‐down, system‐theoretic approach to dynamic optimization with performance guarantees. In this paper, a bottom‐up, algorithmic view is provided to complement the original high‐level development of GBPI. A modified value iteration is introduced, which can provide solutions to the same type of POMDP problems dealt with by GBPI. Numerical simulations are conducted to include a queuing problem and a maze problem to illustrate and verify features of the proposed algorithms as compared to GBPI.

Findings

The direct connection between GBPI and policy iteration is shown under a Markov decision process formulation. As such, additional analytical insights were gained on GBPI. Furthermore, motivated by this analytical framework, the authors propose a modified value iteration as an alternative to addressing the same POMDP problem handled by GBPI.

Originality/value

Several important insights are gained from the analytical framework, which motivate the development of both algorithms. Built on this paradigm, new ADP learning algorithms can be developed, in this case, the modified value iteration, to address a broader class of problems, the POMDP. In addition, it is now possible to provide ADP algorithms with a gradient perspective. Inspired by the fundamental understanding of learning and optimization problems under the gradient‐based framework, additional new insight may be developed for bottom‐up type of algorithms with performance guarantees.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 1 no. 4

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 27 July 2021

Multiple peg-in-hole compliant assembly based on a learning-accelerated deep deterministic policy gradient strategy

Xinwang Li, Juliang Xiao, Wei Zhao, Haitao Liu and Guodong Wang

As complex analysis of contact models is required in the traditional assembly strategy, it is still a challenge for a robot to complete the multiple peg-in-hole assembly tasks…

HTML

PDF (1.8 MB)

Downloads

468

Abstract

Purpose

As complex analysis of contact models is required in the traditional assembly strategy, it is still a challenge for a robot to complete the multiple peg-in-hole assembly tasks autonomously. This paper aims to enable the robot to complete the assembly tasks autonomously and more efficiently, with the strategies learned by reinforcement learning (RL), a learning-accelerated deep deterministic policy gradient (LADDPG) algorithm is proposed.

Design/methodology/approach

The multiple peg-in-hole assembly strategy is designed in two modules: an advanced planning module and a bottom control module. The advanced module is completed by the LADDPG agent, which is used to derive advanced commands based on geometric and environmental constraints, that is, the desired contact force. The bottom-level control module will drive the robot to complete the compliant assembly task through the adaptive impedance algorithm according to the command set issued by the advanced module. In addition, a set of safety assurance mechanisms is developed to safely train a collaborative robot to complete autonomous learning.

Findings

The method can complete the assembly tasks well through RL, and it can realize satisfactory compliance of the robot to the environment. Compared with the original DDPG algorithm, the average values of the instantaneous maximum contact force and contact torque during the assembly process are reduced by approximately 38% and 74%, respectively.

Practical implications

The entire algorithm can also be applied to other robots and the assembly strategy can be applied in the field of the automatic assembly.

Originality/value

A compliant assembly strategy based on the LADDPG algorithm is proposed to complete the automated multiple peg-in-hole assembly tasks.

Details

Industrial Robot: the international journal of robotics research and application, vol. 49 no. 1

Type: Research Article

DOI:

ISSN: 0143-991X

Keywords

View access options

Article

Publication date: 16 October 2018

Model-based deep reinforcement learning with heuristic search for satellite attitude control

Ke Xu, Fengge Wu and Junsuo Zhao

Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control…

HTML

PDF (581 KB)

Downloads

339

Abstract

Purpose

Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control systems are still using classical control technics such as proportional – integral – derivative and slide mode control as major solutions, facing problems with adaptability and automation.

Design/methodology/approach

In this paper, an approach based on deep reinforcement learning is proposed to increase adaptability and autonomy of satellite control system. It is a model-based algorithm which could find solutions with fewer episodes of learning than model-free algorithms.

Findings

Simulation experiment shows that when classical control crashed, this approach could find solution and reach the target with hundreds times of explorations and learning.

Originality/value

This approach is a non-gradient method using heuristic search to optimize policy to avoid local optima. Compared with classical control technics, this approach does not need prior knowledge of satellite or its orbit, has the ability to adapt different kinds of situations with data learning and has the ability to adapt different kinds of satellite and different tasks through transfer learning.

Details

Industrial Robot: the international journal of robotics research and application, vol. 46 no. 3

Type: Research Article

DOI:

ISSN: 0143-991X

Keywords

View access options

Article

Publication date: 1 April 2024

Web-enhanced unmanned aerial vehicle target search method combining imitation learning and reinforcement learning

Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu and Mingke Gao

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the…

HTML

PDF (886 KB)

Downloads

Abstract

Purpose

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.

Design/methodology/approach

Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.

Findings

The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.

Originality/value

The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.

Details

International Journal of Web Information Systems, vol. 20 no. 3

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

Access

Year

Content type

1 – 10 of over 1000

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions