Search results
1 – 10 of over 19000Lei Yang, James Dankert and Jennie Si
The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation…
Abstract
Purpose
The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation based on the concepts of differential costs and performance gradients. Under such a framework, a modified value iteration algorithm is developed that is easy to implement, in the mean time it can address a class of partially observable Markov decision processes (POMDP).
Design/methodology/approach
Gradient‐based policy iteration (GBPI) is a top‐down, system‐theoretic approach to dynamic optimization with performance guarantees. In this paper, a bottom‐up, algorithmic view is provided to complement the original high‐level development of GBPI. A modified value iteration is introduced, which can provide solutions to the same type of POMDP problems dealt with by GBPI. Numerical simulations are conducted to include a queuing problem and a maze problem to illustrate and verify features of the proposed algorithms as compared to GBPI.
Findings
The direct connection between GBPI and policy iteration is shown under a Markov decision process formulation. As such, additional analytical insights were gained on GBPI. Furthermore, motivated by this analytical framework, the authors propose a modified value iteration as an alternative to addressing the same POMDP problem handled by GBPI.
Originality/value
Several important insights are gained from the analytical framework, which motivate the development of both algorithms. Built on this paradigm, new ADP learning algorithms can be developed, in this case, the modified value iteration, to address a broader class of problems, the POMDP. In addition, it is now possible to provide ADP algorithms with a gradient perspective. Inspired by the fundamental understanding of learning and optimization problems under the gradient‐based framework, additional new insight may be developed for bottom‐up type of algorithms with performance guarantees.
Details
Keywords
Ke Xu, Fengge Wu and Junsuo Zhao
Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control…
Abstract
Purpose
Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control systems are still using classical control technics such as proportional – integral – derivative and slide mode control as major solutions, facing problems with adaptability and automation.
Design/methodology/approach
In this paper, an approach based on deep reinforcement learning is proposed to increase adaptability and autonomy of satellite control system. It is a model-based algorithm which could find solutions with fewer episodes of learning than model-free algorithms.
Findings
Simulation experiment shows that when classical control crashed, this approach could find solution and reach the target with hundreds times of explorations and learning.
Originality/value
This approach is a non-gradient method using heuristic search to optimize policy to avoid local optima. Compared with classical control technics, this approach does not need prior knowledge of satellite or its orbit, has the ability to adapt different kinds of situations with data learning and has the ability to adapt different kinds of satellite and different tasks through transfer learning.
Details
Keywords
This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning (IRL).
Abstract
Purpose
This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning (IRL).
Design/methodology/approach
Reinforcement learning (RL) techniques provide a powerful solution for sequential decision making problems under uncertainty. RL uses an agent equipped with a reward function to find a policy through interactions with a dynamic environment. However, one major assumption of existing RL algorithms is that reward function, the most succinct representation of the designer's intention, needs to be provided beforehand. In practice, the reward function can be very hard to specify and exhaustive to tune for large and complex problems, and this inspires the development of IRL, an extension of RL, which directly tackles this problem by learning the reward function through expert demonstrations. In this paper, the original IRL algorithms and its close variants, as well as their recent advances are reviewed and compared.
Findings
This paper can serve as an introduction guide of fundamental theory and developments, as well as the applications of IRL.
Originality/value
This paper surveys the theories and applications of IRL, which is the latest development of RL and has not been done so far.
Details
Keywords
Pardis Pourghomi, Milan Dordevic and Fadi Safieddine
In March 2019, Facebook updated its security procedures requesting ID verification for people who wish to advertise or promote political posts of adverts. The announcement…
Abstract
Purpose
In March 2019, Facebook updated its security procedures requesting ID verification for people who wish to advertise or promote political posts of adverts. The announcement received little media coverage even though it is an interesting development in the battle against fake news. This paper aims to review the current literature on different approaches in the battle against the spread of fake news, including the use of computer algorithms, artificial intelligence (AI) and introduction of ID checks.
Design/methodology/approach
Critical to the evaluation is consideration into ID checks as a means to combat the spread of fake news. To understand the process and how it works, the team undertook a social experiment combined with reflective analysis to better understand the impact of ID check policies when combined with other standards policies of a typical platform.
Findings
The analysis identifies grave concerns. In a wider context, standardising such policy will leave political activists in countries vulnerable to reprisal from authoritarian regimes. Other victims of the impacts include people who use fake names to protect the identity of adopted children or to protect anonymity from abusive partners.
Originality/value
The analysis also points to the fact that troll armies could bypass these checks rendering the use of ID checks less effective in the battle to combat fake news.
Details
Keywords
Donghee (Don) Shin, Anestis Fotiadis and Hongsik Yu
The purpose of this study is to offer a roadmap for work on the ethical and societal implications of algorithms and AI. Based on an analysis of the social, technical and…
Abstract
Purpose
The purpose of this study is to offer a roadmap for work on the ethical and societal implications of algorithms and AI. Based on an analysis of the social, technical and regulatory challenges posed by algorithmic systems in Korea, this work conducts socioecological evaluations of the governance of algorithmic transparency and accountability.
Design/methodology/approach
This paper analyzes algorithm design and development from critical socioecological angles: social, technological, cultural and industrial phenomena that represent the strategic interaction among people, technology and society, touching on sensitive issues of a legal, a cultural and an ethical nature.
Findings
Algorithm technologies are a part of a social ecosystem, and its development should be based on user interests and rights within a social and cultural milieu. An algorithm represents an interrelated, multilayered ecosystem of networks, protocols, applications, services, practices and users.
Practical implications
Value-sensitive algorithm design is proposed as a novel approach for designing algorithms. As algorithms have become a constitutive technology that shapes human life, it is essential to be aware of the value-ladenness of algorithm development. Human values and social issues can be reflected in an algorithm design.
Originality/value
The arguments in this study help ensure the legitimacy and effectiveness of algorithms. This study provides insight into the challenges and opportunities of algorithms through the lens of a socioecological analysis: political discourse, social dynamics and technological choices inherent in the development of algorithm-based ecology.
Details
Keywords
Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu and Mingke Gao
This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the…
Abstract
Purpose
This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.
Design/methodology/approach
Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.
Findings
The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.
Originality/value
The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.
Details
Keywords
Adolfo Perrusquía, Wen Yu and Alberto Soria
The position/force control of the robot needs the parameters of the impedance model and generates the desired position from the contact force in the environment. When the…
Abstract
Purpose
The position/force control of the robot needs the parameters of the impedance model and generates the desired position from the contact force in the environment. When the environment is unknown, learning algorithms are needed to estimate both the desired force and the parameters of the impedance model.
Design/methodology/approach
In this paper, the authors use reinforcement learning to learn only the desired force, then they use proportional-integral-derivative admittance control to generate the desired position. The results of the experiment are presented to verify their approach.
Findings
The position error is minimized without knowing the environment or the impedance parameters. Another advantage of this simplified position/force control is that the transformation of the Cartesian space to the joint space by inverse kinematics is avoided by the feedback control mechanism. The stability of the closed-loop system is proven.
Originality/value
The position error is minimized without knowing the environment or the impedance parameters. The stability of the closed-loop system is proven.
Details
Keywords
Xiaohan Xu, Xudong Huang, Ke Zhang and Ming Zhou
In general, the existing compressor design methods require abundant knowledge and inspiration. The purpose of this study is to identify an intellectual design optimization method…
Abstract
Purpose
In general, the existing compressor design methods require abundant knowledge and inspiration. The purpose of this study is to identify an intellectual design optimization method that enables a machine to learn how to design it.
Design/methodology/approach
The airfoil design process was solved using the reinforcement learning (RL) method. An intellectual method based on a modified deep deterministic policy gradient (DDPG) algorithm was implemented. The new method was applied to agents to learn the design policy under dynamic constraints. The agents explored the design space with the help of a surrogate model and airfoil parameterization.
Findings
The agents successfully learned to design the airfoils. The loss coefficients of a controlled diffusion airfoil improved by 1.25% and 3.23% in the two- and four-dimensional design spaces, respectively. The agents successfully learned to design under various constraints. Additionally, the modified DDPG method was compared with a genetic algorithm optimizer, verifying that the former was one to two orders of magnitude faster in policy searching. The NACA65 airfoil was redesigned to verify the generalization.
Originality/value
It is feasible to consider the compressor design as an RL problem. Trained agents can determine and record the design policy and adapt it to different initiations and dynamic constraints. More intelligence is demonstrated than when traditional optimization methods are used. This methodology represents a new, small step toward the intelligent design of compressors.
Details
Keywords
Joy P. Vazhayil and R. Balasubramanian
Optimization of energy planning for growth and sustainable development has become very important in the context of climate change mitigation imperatives in developing countries…
Abstract
Purpose
Optimization of energy planning for growth and sustainable development has become very important in the context of climate change mitigation imperatives in developing countries. Existing models do not capture developing country realities adequately. The purpose of this paper is to conceptualizes a framework for energy strategy optimization of the Indian energy sector, which can be applied in all emerging economies.
Design/methodology/approach
Hierarchical multi‐objective policy optimization methodology adopts a policy‐centric approach and groups the energy strategies into multi‐level portfolios based on convergence of objectives appropriate to each level. This arrangement facilitates application of the optimality principle of dynamic programming. Synchronised optimization of strategies with respect to the common objectives at each level results in optimal policy portfolios.
Findings
The reductionist policy‐centric approach to complex energy economy modelling, facilitated by the dynamic programming methodology, is most suitable for policy optimization in the context of a developing country. Barriers to project implementation and cost risks are critical features of developing countries which are captured in the framework in the form of a comprehensive risk barrier index. Genetic algorithms are suitable for optimization of the first level objectives, while the efficiency approach, using restricted weight stochastic data envelopment analysis, is appropriate for higher levels of the objective hierarchy.
Research limitations/implications
The methodology has been designed for application to the energy sector planning for India's 12th Five Year Plan for which the objectives of faster growth, better inclusion, energy security and sustainability have been identified. The conceptual framework combines, within the policy domain, the bottom‐up and top‐down processes to form a hybrid modelling approach yielding optimal outcomes, transparent and convincing to the policy makers. The research findings have substantial implications for transition management to a sustainable energy framework.
Originality/value
The methodology is general in nature and can be employed in all sectors of the economy. It is especially suited to policy design in developing countries with the ground realities factored into the model as project barriers. It offers modularity and flexibility in implementation and can accommodate all the key strategies from diverse sectors along with multiple objectives in the policy optimization process. It enables adoption of an evidence‐based and transparent approach to policy making. The research findings have substantial value for transition management to a sustainable energy framework in developing countries.
Details
Keywords
Guanzheng Wang, Yinbo Xu, Zhihong Liu, Xin Xu, Xiangke Wang and Jiarun Yan
This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample…
Abstract
Purpose
This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample efficiency in DRL and speed up the training. To improve the applicability and reliability of the DRL-based approach in multi-UAV control problems.
Design/methodology/approach
In this paper, a fully distributed collision detection and avoidance approach for multi-UAV based on DRL is proposed. A method that integrates human experience into policy training via a human experience-based adviser is proposed. The authors propose a hybrid control method which combines the learning-based policy with traditional model-based control. Extensive experiments including simulations, real flights and comparative experiments are conducted to evaluate the performance of the approach.
Findings
A fully distributed multi-UAV collision detection and avoidance method based on DRL is realized. The reward curve shows that the training process when integrating human experience is significantly accelerated and the mean episode reward is higher than the pure DRL method. The experimental results show that the DRL method with human experience integration has a significant improvement than the pure DRL method for multi-UAV collision detection and avoidance. Moreover, the safer flight brought by the hybrid control method has also been validated.
Originality/value
The fully distributed architecture is suitable for large-scale unmanned aerial vehicle (UAV) swarms and real applications. The DRL method with human experience integration has significantly accelerated the training compared to the pure DRL method. The proposed hybrid control strategy makes up for the shortcomings of two-dimensional light detection and ranging and other puzzles in applications.
Details