Search results

1 – 10 of over 19000
Article
Publication date: 17 October 2008

Lei Yang, James Dankert and Jennie Si

The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation…

Abstract

Purpose

The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation based on the concepts of differential costs and performance gradients. Under such a framework, a modified value iteration algorithm is developed that is easy to implement, in the mean time it can address a class of partially observable Markov decision processes (POMDP).

Design/methodology/approach

Gradient‐based policy iteration (GBPI) is a top‐down, system‐theoretic approach to dynamic optimization with performance guarantees. In this paper, a bottom‐up, algorithmic view is provided to complement the original high‐level development of GBPI. A modified value iteration is introduced, which can provide solutions to the same type of POMDP problems dealt with by GBPI. Numerical simulations are conducted to include a queuing problem and a maze problem to illustrate and verify features of the proposed algorithms as compared to GBPI.

Findings

The direct connection between GBPI and policy iteration is shown under a Markov decision process formulation. As such, additional analytical insights were gained on GBPI. Furthermore, motivated by this analytical framework, the authors propose a modified value iteration as an alternative to addressing the same POMDP problem handled by GBPI.

Originality/value

Several important insights are gained from the analytical framework, which motivate the development of both algorithms. Built on this paradigm, new ADP learning algorithms can be developed, in this case, the modified value iteration, to address a broader class of problems, the POMDP. In addition, it is now possible to provide ADP algorithms with a gradient perspective. Inspired by the fundamental understanding of learning and optimization problems under the gradient‐based framework, additional new insight may be developed for bottom‐up type of algorithms with performance guarantees.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 1 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 16 October 2018

Ke Xu, Fengge Wu and Junsuo Zhao

Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control…

Abstract

Purpose

Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control systems are still using classical control technics such as proportional – integral – derivative and slide mode control as major solutions, facing problems with adaptability and automation.

Design/methodology/approach

In this paper, an approach based on deep reinforcement learning is proposed to increase adaptability and autonomy of satellite control system. It is a model-based algorithm which could find solutions with fewer episodes of learning than model-free algorithms.

Findings

Simulation experiment shows that when classical control crashed, this approach could find solution and reach the target with hundreds times of explorations and learning.

Originality/value

This approach is a non-gradient method using heuristic search to optimize policy to avoid local optima. Compared with classical control technics, this approach does not need prior knowledge of satellite or its orbit, has the ability to adapt different kinds of situations with data learning and has the ability to adapt different kinds of satellite and different tasks through transfer learning.

Details

Industrial Robot: the international journal of robotics research and application, vol. 46 no. 3
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 17 August 2012

Shao Zhifei and Er Meng Joo

This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning (IRL).

2447

Abstract

Purpose

This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning (IRL).

Design/methodology/approach

Reinforcement learning (RL) techniques provide a powerful solution for sequential decision making problems under uncertainty. RL uses an agent equipped with a reward function to find a policy through interactions with a dynamic environment. However, one major assumption of existing RL algorithms is that reward function, the most succinct representation of the designer's intention, needs to be provided beforehand. In practice, the reward function can be very hard to specify and exhaustive to tune for large and complex problems, and this inspires the development of IRL, an extension of RL, which directly tackles this problem by learning the reward function through expert demonstrations. In this paper, the original IRL algorithms and its close variants, as well as their recent advances are reviewed and compared.

Findings

This paper can serve as an introduction guide of fundamental theory and developments, as well as the applications of IRL.

Originality/value

This paper surveys the theories and applications of IRL, which is the latest development of RL and has not been done so far.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 5 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 20 January 2020

Pardis Pourghomi, Milan Dordevic and Fadi Safieddine

In March 2019, Facebook updated its security procedures requesting ID verification for people who wish to advertise or promote political posts of adverts. The announcement…

Abstract

Purpose

In March 2019, Facebook updated its security procedures requesting ID verification for people who wish to advertise or promote political posts of adverts. The announcement received little media coverage even though it is an interesting development in the battle against fake news. This paper aims to review the current literature on different approaches in the battle against the spread of fake news, including the use of computer algorithms, artificial intelligence (AI) and introduction of ID checks.

Design/methodology/approach

Critical to the evaluation is consideration into ID checks as a means to combat the spread of fake news. To understand the process and how it works, the team undertook a social experiment combined with reflective analysis to better understand the impact of ID check policies when combined with other standards policies of a typical platform.

Findings

The analysis identifies grave concerns. In a wider context, standardising such policy will leave political activists in countries vulnerable to reprisal from authoritarian regimes. Other victims of the impacts include people who use fake names to protect the identity of adopted children or to protect anonymity from abusive partners.

Originality/value

The analysis also points to the fact that troll armies could bypass these checks rendering the use of ID checks less effective in the battle to combat fake news.

Details

International Journal of Pervasive Computing and Communications, vol. 16 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 16 July 2019

Donghee (Don) Shin, Anestis Fotiadis and Hongsik Yu

The purpose of this study is to offer a roadmap for work on the ethical and societal implications of algorithms and AI. Based on an analysis of the social, technical and…

Abstract

Purpose

The purpose of this study is to offer a roadmap for work on the ethical and societal implications of algorithms and AI. Based on an analysis of the social, technical and regulatory challenges posed by algorithmic systems in Korea, this work conducts socioecological evaluations of the governance of algorithmic transparency and accountability.

Design/methodology/approach

This paper analyzes algorithm design and development from critical socioecological angles: social, technological, cultural and industrial phenomena that represent the strategic interaction among people, technology and society, touching on sensitive issues of a legal, a cultural and an ethical nature.

Findings

Algorithm technologies are a part of a social ecosystem, and its development should be based on user interests and rights within a social and cultural milieu. An algorithm represents an interrelated, multilayered ecosystem of networks, protocols, applications, services, practices and users.

Practical implications

Value-sensitive algorithm design is proposed as a novel approach for designing algorithms. As algorithms have become a constitutive technology that shapes human life, it is essential to be aware of the value-ladenness of algorithm development. Human values and social issues can be reflected in an algorithm design.

Originality/value

The arguments in this study help ensure the legitimacy and effectiveness of algorithms. This study provides insight into the challenges and opportunities of algorithms through the lens of a socioecological analysis: political discourse, social dynamics and technological choices inherent in the development of algorithm-based ecology.

Details

Digital Policy, Regulation and Governance, vol. 21 no. 4
Type: Research Article
ISSN: 2398-5038

Keywords

Article
Publication date: 1 April 2024

Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu and Mingke Gao

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the…

Abstract

Purpose

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.

Design/methodology/approach

Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.

Findings

The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.

Originality/value

The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.

Details

International Journal of Web Information Systems, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 7 May 2019

Adolfo Perrusquía, Wen Yu and Alberto Soria

The position/force control of the robot needs the parameters of the impedance model and generates the desired position from the contact force in the environment. When the…

1051

Abstract

Purpose

The position/force control of the robot needs the parameters of the impedance model and generates the desired position from the contact force in the environment. When the environment is unknown, learning algorithms are needed to estimate both the desired force and the parameters of the impedance model.

Design/methodology/approach

In this paper, the authors use reinforcement learning to learn only the desired force, then they use proportional-integral-derivative admittance control to generate the desired position. The results of the experiment are presented to verify their approach.

Findings

The position error is minimized without knowing the environment or the impedance parameters. Another advantage of this simplified position/force control is that the transformation of the Cartesian space to the joint space by inverse kinematics is avoided by the feedback control mechanism. The stability of the closed-loop system is proven.

Originality/value

The position error is minimized without knowing the environment or the impedance parameters. The stability of the closed-loop system is proven.

Details

Industrial Robot: the international journal of robotics research and application, vol. 46 no. 2
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 15 September 2023

Xiaohan Xu, Xudong Huang, Ke Zhang and Ming Zhou

In general, the existing compressor design methods require abundant knowledge and inspiration. The purpose of this study is to identify an intellectual design optimization method…

Abstract

Purpose

In general, the existing compressor design methods require abundant knowledge and inspiration. The purpose of this study is to identify an intellectual design optimization method that enables a machine to learn how to design it.

Design/methodology/approach

The airfoil design process was solved using the reinforcement learning (RL) method. An intellectual method based on a modified deep deterministic policy gradient (DDPG) algorithm was implemented. The new method was applied to agents to learn the design policy under dynamic constraints. The agents explored the design space with the help of a surrogate model and airfoil parameterization.

Findings

The agents successfully learned to design the airfoils. The loss coefficients of a controlled diffusion airfoil improved by 1.25% and 3.23% in the two- and four-dimensional design spaces, respectively. The agents successfully learned to design under various constraints. Additionally, the modified DDPG method was compared with a genetic algorithm optimizer, verifying that the former was one to two orders of magnitude faster in policy searching. The NACA65 airfoil was redesigned to verify the generalization.

Originality/value

It is feasible to consider the compressor design as an RL problem. Trained agents can determine and record the design policy and adapt it to different initiations and dynamic constraints. More intelligence is demonstrated than when traditional optimization methods are used. This methodology represents a new, small step toward the intelligent design of compressors.

Details

Engineering Computations, vol. 40 no. 9/10
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 7 September 2012

Joy P. Vazhayil and R. Balasubramanian

Optimization of energy planning for growth and sustainable development has become very important in the context of climate change mitigation imperatives in developing countries…

Abstract

Purpose

Optimization of energy planning for growth and sustainable development has become very important in the context of climate change mitigation imperatives in developing countries. Existing models do not capture developing country realities adequately. The purpose of this paper is to conceptualizes a framework for energy strategy optimization of the Indian energy sector, which can be applied in all emerging economies.

Design/methodology/approach

Hierarchical multi‐objective policy optimization methodology adopts a policy‐centric approach and groups the energy strategies into multi‐level portfolios based on convergence of objectives appropriate to each level. This arrangement facilitates application of the optimality principle of dynamic programming. Synchronised optimization of strategies with respect to the common objectives at each level results in optimal policy portfolios.

Findings

The reductionist policy‐centric approach to complex energy economy modelling, facilitated by the dynamic programming methodology, is most suitable for policy optimization in the context of a developing country. Barriers to project implementation and cost risks are critical features of developing countries which are captured in the framework in the form of a comprehensive risk barrier index. Genetic algorithms are suitable for optimization of the first level objectives, while the efficiency approach, using restricted weight stochastic data envelopment analysis, is appropriate for higher levels of the objective hierarchy.

Research limitations/implications

The methodology has been designed for application to the energy sector planning for India's 12th Five Year Plan for which the objectives of faster growth, better inclusion, energy security and sustainability have been identified. The conceptual framework combines, within the policy domain, the bottom‐up and top‐down processes to form a hybrid modelling approach yielding optimal outcomes, transparent and convincing to the policy makers. The research findings have substantial implications for transition management to a sustainable energy framework.

Originality/value

The methodology is general in nature and can be employed in all sectors of the economy. It is especially suited to policy design in developing countries with the ground realities factored into the model as project barriers. It offers modularity and flexibility in implementation and can accommodate all the key strategies from diverse sectors along with multiple objectives in the policy optimization process. It enables adoption of an evidence‐based and transparent approach to policy making. The research findings have substantial value for transition management to a sustainable energy framework in developing countries.

Details

International Journal of Energy Sector Management, vol. 6 no. 3
Type: Research Article
ISSN: 1750-6220

Keywords

Article
Publication date: 24 September 2021

Guanzheng Wang, Yinbo Xu, Zhihong Liu, Xin Xu, Xiangke Wang and Jiarun Yan

This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample…

Abstract

Purpose

This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample efficiency in DRL and speed up the training. To improve the applicability and reliability of the DRL-based approach in multi-UAV control problems.

Design/methodology/approach

In this paper, a fully distributed collision detection and avoidance approach for multi-UAV based on DRL is proposed. A method that integrates human experience into policy training via a human experience-based adviser is proposed. The authors propose a hybrid control method which combines the learning-based policy with traditional model-based control. Extensive experiments including simulations, real flights and comparative experiments are conducted to evaluate the performance of the approach.

Findings

A fully distributed multi-UAV collision detection and avoidance method based on DRL is realized. The reward curve shows that the training process when integrating human experience is significantly accelerated and the mean episode reward is higher than the pure DRL method. The experimental results show that the DRL method with human experience integration has a significant improvement than the pure DRL method for multi-UAV collision detection and avoidance. Moreover, the safer flight brought by the hybrid control method has also been validated.

Originality/value

The fully distributed architecture is suitable for large-scale unmanned aerial vehicle (UAV) swarms and real applications. The DRL method with human experience integration has significantly accelerated the training compared to the pure DRL method. The proposed hybrid control strategy makes up for the shortcomings of two-dimensional light detection and ranging and other puzzles in applications.

Details

Industrial Robot: the international journal of robotics research and application, vol. 49 no. 2
Type: Research Article
ISSN: 0143-991X

Keywords

1 – 10 of over 19000