Books and journals Case studies Expert Briefings Open Access
Advanced search

Search results

1 – 10 of over 40000
To view the access options for this content please click here
Article
Publication date: 17 August 2012

A survey of inverse reinforcement learning techniques

Shao Zhifei and Er Meng Joo

This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning (IRL).

HTML
PDF (153 KB)

Abstract

Purpose

This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning (IRL).

Design/methodology/approach

Reinforcement learning (RL) techniques provide a powerful solution for sequential decision making problems under uncertainty. RL uses an agent equipped with a reward function to find a policy through interactions with a dynamic environment. However, one major assumption of existing RL algorithms is that reward function, the most succinct representation of the designer's intention, needs to be provided beforehand. In practice, the reward function can be very hard to specify and exhaustive to tune for large and complex problems, and this inspires the development of IRL, an extension of RL, which directly tackles this problem by learning the reward function through expert demonstrations. In this paper, the original IRL algorithms and its close variants, as well as their recent advances are reviewed and compared.

Findings

This paper can serve as an introduction guide of fundamental theory and developments, as well as the applications of IRL.

Originality/value

This paper surveys the theories and applications of IRL, which is the latest development of RL and has not been done so far.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 5 no. 3
Type: Research Article
DOI: https://doi.org/10.1108/17563781211255862
ISSN: 1756-378X

Keywords

  • Inverse reinforcement learning
  • Reward function
  • Reinforcement learning
  • Artificial intelligence
  • Learning methods

To view the access options for this content please click here
Article
Publication date: 23 March 2012

An applied organizational rewards distribution system

Pratim Datta

How can managers optimally distribute rewards among individuals in a job group? While the management literature on compensation has established the need for equitable…

HTML
PDF (231 KB)

Abstract

Purpose

How can managers optimally distribute rewards among individuals in a job group? While the management literature on compensation has established the need for equitable reimbursements for individuals holding similar positions in a function or group, an objective grounding of rewards allocation has certainly escaped scrutiny. This paper aims to address this issue.

Design/methodology/approach

Using an optimization model based on a financial rubric, the portfolio approach allows organizations to envision human capital assets as a set (i.e. a team, group, function), rather than independent contractors. The portfolio can be organized and managed for meeting various organizational objectives (e.g. optimizing returns and instrumental benefits, assessing resource allocations).

Findings

This research introduces an innovative portfolio management scheme for employee rewards distribution. Akin to investing in capital assets, organizations invest considerable resources in their human capital. In doing so, organizations, over time, create a portfolio of human capital assets. The findings reduce large variances in rewards distribution yet serving employee and management considerations.

Practical implications

The research has tremendous implications for managers who can mitigate serious equitable rewards distribution issues by creating a process that exemplifies rewards distribution using four different rewards allocation scenarios based on varying managerial prerogatives.

Originality/value

This research is a unique model that addresses a pressing human resource issue by solution based on a usable and feasible optimization mechanism from financial portfolio theory.

Details

Management Decision, vol. 50 no. 3
Type: Research Article
DOI: https://doi.org/10.1108/00251741211216241
ISSN: 0025-1747

Keywords

  • Asset management
  • Costing
  • Human capital
  • Optimization
  • Portfolio management
  • Rewards distribution

To view the access options for this content please click here
Article
Publication date: 2 November 2010

Delay‐discounting rewards from consumer sales promotions

Kesha K. Coker, Deepa Pillai and Siva K. Balasubramanian

Rewards from sales promotions may be either immediate (e.g. instant savings, coupons, instant rebates) or delayed (e.g. rebates, refunds). The latter type is of interest…

HTML
PDF (249 KB)

Abstract

Purpose

Rewards from sales promotions may be either immediate (e.g. instant savings, coupons, instant rebates) or delayed (e.g. rebates, refunds). The latter type is of interest in this study. The purpose of this paper is to present the hyperbolic discounting framework as an explanation for how consumers delay‐discount rewards, and test whether this holds for both high‐price and low‐price product categories.

Design/methodology/approach

Data were collected by administering two online surveys to respondents. One survey presented choice scenarios between sales promotion formats for a high‐priced product (a laptop, n=154) and the other for a low‐priced product (a cell phone, n=98). Hyperbolic and exponential functions were then fitted to the data.

Findings

The hyperbolic function had a better fit than the exponential function for the low‐priced product. However, this effect was not evident in the case of the high‐priced product; no significant difference was found between the functions. The rate of discounting was greater for the high‐priced product than for the low‐priced product. Thus, for low‐priced products, rather than discount a reward rationally, consumers tend to discount the value of the reward at a decreasing rate.

Originality/value

This study addresses delay discounting in the context of a typical consumer buying situation. It also addresses the possibility of consumers applying different forms of discounting to products at different price levels and tests for the same. The results are of considerable significance for marketers wishing to offer price discounts to consumers. For low‐priced products, marketers seem to have more flexibility in delaying the reward, since the rate of discounting decreases for longer delay periods. At the same time, the discount rate for high‐priced products is higher than that for low‐priced products, hence delay periods may have a more critical role as discounted values fall steeply with an increase in delay to reward.

Details

Journal of Product & Brand Management, vol. 19 no. 7
Type: Research Article
DOI: https://doi.org/10.1108/10610421011086900
ISSN: 1061-0421

Keywords

  • Discounts
  • Coupons
  • Consumer behaviour

To view the access options for this content please click here
Article
Publication date: 1 February 1973

ON THE QUANTITATIVE FORMULATION OF THE HUMAN REWARD FUNCTION: Part I. Theory

GEORGE C. THEODORIDIS

An information‐like formulation of the human reward function is shown to be in qualitative agreement with some prominent features of human behavior. Individual events are…

HTML
PDF (442 KB)

Abstract

An information‐like formulation of the human reward function is shown to be in qualitative agreement with some prominent features of human behavior. Individual events are regarded as “symbols” in a communication theory sense, and their reward for a person depends on their frequency of occurrence in his environment.

Details

Kybernetes, vol. 2 no. 2
Type: Research Article
DOI: https://doi.org/10.1108/eb005328
ISSN: 0368-492X

To view the access options for this content please click here
Article
Publication date: 8 June 2010

Solving two‐armed Bernoulli bandit problems using a Bayesian learning automaton

Ole‐Christoffer Granmo

The two‐armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with…

HTML
PDF (273 KB)

Abstract

Purpose

The two‐armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with each pull resulting either in a reward or a penalty. The reward probabilities of each arm are unknown, and thus one must balance between exploiting existing knowledge about the arms, and obtaining new information. The purpose of this paper is to report research into a completely new family of solution schemes for the TABB problem: the Bayesian learning automaton (BLA) family.

Design/methodology/approach

Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. BLA avoids the problem of computational intractability by not explicitly performing the Bayesian computations. Rather, it is based upon merely counting rewards/penalties, combined with random sampling from a pair of twin Beta distributions. This is intuitively appealing since the Bayesian conjugate prior for a binomial parameter is the Beta distribution.

Findings

BLA is to be proven instantaneously self‐correcting, and it converges to only pulling the optimal arm with probability as close to unity as desired. Extensive experiments demonstrate that the BLA does not rely on external learning speed/accuracy control. It also outperforms established non‐Bayesian top performers for the TABB problem. Finally, the BLA provides superior performance in a distributed application, namely, the Goore game (GG).

Originality/value

The value of this paper is threefold. First of all, the reported BLA takes advantage of the Bayesian perspective for tackling TABBs, yet avoids the computational complexity inherent in Bayesian approaches. Second, the improved performance offered by the BLA opens up for increased accuracy in a number of TABB‐related applications, such as the GG. Third, the reported results form the basis for a new avenue of research – even for cases when the reward/penalty distribution is not Bernoulli distributed. Indeed, the paper advocates the use of a Bayesian methodology, used in conjunction with the corresponding appropriate conjugate prior.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 3 no. 2
Type: Research Article
DOI: https://doi.org/10.1108/17563781011049179
ISSN: 1756-378X

Keywords

  • Learning processes
  • Programming and algorithm theory
  • Automata theory
  • Stochastic processes

To view the access options for this content please click here
Book part
Publication date: 21 November 2016

Motivated Cognition: Neural and Computational Mechanisms of Curiosity, Attention, and Intrinsic Motivation

Jacqueline Gottlieb, Manuel Lopes and Pierre-Yves Oudeyer

Based on a synthesis of findings from psychology, neuroscience, and machine learning, we propose a unified theory of curiosity as a form of motivated cognition. Curiosity…

HTML
PDF (1.1 MB)
EPUB (316 KB)

Abstract

Based on a synthesis of findings from psychology, neuroscience, and machine learning, we propose a unified theory of curiosity as a form of motivated cognition. Curiosity, we propose, is comprised of a family of mechanisms that range in complexity from simple heuristics based on novelty, salience, or surprise, to drives based on reward and uncertainty reduction and finally, to self-directed metacognitive processes. These mechanisms, we propose, have evolved to allow agents to discover useful regularities in the world – steering them toward niches of maximal learning progress and away from both random and highly familiar tasks. We emphasize that curiosity arises organically in conjunction with cognition and motivation, being generated by cognitive processes and in turn, motivating them. We hope that this view will spur the systematic study of curiosity as an integral aspect of cognition and decision making during development and adulthood.

Details

Recent Developments in Neuroscience Research on Human Motivation
Type: Book
DOI: https://doi.org/10.1108/S0749-742320160000019017
ISBN: 978-1-78635-474-7

Keywords

  • Intrinsic motivation
  • active learning
  • memory
  • attention
  • metacognition
  • development

To view the access options for this content please click here
Article
Publication date: 11 April 2020

A novel movies recommendation algorithm based on reinforcement learning with DDPG policy

Qiaoling Zhou

English original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original…

HTML
PDF (904 KB)

Abstract

Purpose

English original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies and reviews, this paper proposed an improved deep reinforcement learning algorithm for the recommendation of movies. In fact, although the conventional movies recommendation algorithms have solved the problem of information overload, they still have their limitations in the case of cold start-up and sparse data.

Design/methodology/approach

To solve the aforementioned problems of conventional movies recommendation algorithms, this paper proposed a recommendation algorithm based on the theory of deep reinforcement learning, which uses the deep deterministic policy gradient (DDPG) algorithm to solve the cold starting and sparse data problems and uses Item2vec to transform discrete action space into a continuous one. Meanwhile, a reward function combining with cosine distance and Euclidean distance is proposed to ensure that the neural network does not converge to local optimum prematurely.

Findings

In order to verify the feasibility and validity of the proposed algorithm, the state of the art and the proposed algorithm are compared in indexes of RMSE, recall rate and accuracy based on the MovieLens English original movie data set for the experiments. Experimental results have shown that the proposed algorithm is superior to the conventional algorithm in various indicators.

Originality/value

Applying the proposed algorithm to recommend English original movies, DDPG policy produces better recommendation results and alleviates the impact of cold start and sparse data.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 1
Type: Research Article
DOI: https://doi.org/10.1108/IJICC-09-2019-0103
ISSN: 1756-378X

Keywords

  • Reinforcement learning
  • Deep deterministic policy gradient
  • English original movies
  • Movies recommendation
  • Cold start

To view the access options for this content please click here
Book part
Publication date: 21 November 2016

Epilogue – Distinct Motivations and Their Differentiated Mechanisms: Reflections on the Emerging Neuroscience of Human Motivation

We reflect upon the histories of the behavioral science and the neuroscience of motivation, taking note of how these increasingly consilient disciplines inform each other…

HTML
PDF (183 KB)
EPUB (90 KB)

Abstract

We reflect upon the histories of the behavioral science and the neuroscience of motivation, taking note of how these increasingly consilient disciplines inform each other. This volume’s chapters illustrate how the field has moved beyond the study of immediate external rewards to the examination of neural mechanisms underlying varied motivational and appetitive states. Exemplifying this trend, we focus on emerging knowledge about intrinsic motivation, linking it with research on both the play and exploratory behaviors of nonhuman animals. We also speculate about large-scale brain networks related to salience processing as a possibly unique component of human intrinsic motivation. We further review emerging studies on neural correlates of basic psychological needs during decision making that are beginning to shine light on the integrative processes that support autonomous functioning. As with the contributions in this volume, such research reflects the increasing iteration between mechanistic studies and contemporary psychological models of human motivation.

Details

Recent Developments in Neuroscience Research on Human Motivation
Type: Book
DOI: https://doi.org/10.1108/S0749-742320160000019009
ISBN: 978-1-78635-474-7

Keywords

  • Autonomy
  • decision making
  • intrinsic motivation
  • PLAY system
  • SEEKING system
  • self-determination theory

To view the access options for this content please click here
Article
Publication date: 4 January 2011

Influences on reward mix determination: reward consultants' perspectives

Jonathan Chapman and Clare Kelliher

Reward research has focussed on level (what individuals are paid) and structure (relationship between different levels of reward). Less emphasis has been given to reward…

HTML
PDF (215 KB)

Abstract

Purpose

Reward research has focussed on level (what individuals are paid) and structure (relationship between different levels of reward). Less emphasis has been given to reward mix decisions, i.e. the relative proportions of each element making up overall reward. This paper seeks to examine the determinants of reward mix.

Design/methodology/approach

Interview based research with reward consultants as key organisational observers and participants in reward mix decision making.

Findings

Benchmarking has led to the development of reward mix norms. Organisations are under pressure to conform to these norms, moderated by leadership beliefs, the occurrence of events and the extent to which organisations' change capability can overcome strong institutional forces.

Research limitations/implications

The results question agency theory based explanations of reward mix determination and point towards resource dependence and institutional theory perspectives being more suitable theoretical frameworks.

Practical implications

The model developed allows reward managers to consider how the moderating variables, to the dominant mimetic pressure faced, could be manipulated for their firm to allow greater differentiation of the reward mix.

Originality/value

Academically the work contributes to a programme of research into reward determination from a constructionist perspective and aims to provide greater theoretical robustness to the subject. Practically, the findings may prompt practitioners to think more consciously about the drivers of their firm's reward mix. Policy makers may use the stronger theoretical base for understanding the determinants of reward mix choices and the extent to which organisational free choice and institutionally determined choice influence final choices in reward policy decision making.

Details

Employee Relations, vol. 33 no. 2
Type: Research Article
DOI: https://doi.org/10.1108/01425451111096677
ISSN: 0142-5455

Keywords

  • Pay policies
  • Organizational theory
  • Consultants

To view the access options for this content please click here
Article
Publication date: 1 December 2006

Optimal replacement of systems subject to shocks and random threshold failure

Alagar Rangan, Dimple Thyagarajan and Y Sarada

The purpose of this paper is to generalize Yeh and Zhang's 2004 random threshold failure model for deteriorating systems.

HTML
PDF (182 KB)

Abstract

Purpose

The purpose of this paper is to generalize Yeh and Zhang's 2004 random threshold failure model for deteriorating systems.

Design/methodology/approach

An N‐policy was adopted by which the system was replaced after the Nth failure.

Findings

The model was found to have practical applications in warranty cost analysis.

Originality/value

By identifying the instance of a shock as the failure of the system and the threshold times as the warranty period offered and changing the definition of lethal shock (system failure in this case) as the occurrence of a shock within a threshold period in our generalized model, one can study the renewing warranty cost analysis.

Details

International Journal of Quality & Reliability Management, vol. 23 no. 9
Type: Research Article
DOI: https://doi.org/10.1108/02656710610704267
ISSN: 0265-671X

Keywords

  • Systems theory
  • Replacement control
  • Production methods
  • Production
  • planning and control

Access
Only content I have access to
Only Open Access
Year
  • Last week (73)
  • Last month (261)
  • Last 3 months (834)
  • Last 6 months (1595)
  • Last 12 months (3090)
  • All dates (40521)
Content type
  • Article (32970)
  • Book part (5897)
  • Earlycite article (1159)
  • Case study (461)
  • Expert briefing (33)
  • Executive summary (1)
1 – 10 of over 40000
Emerald Publishing
  • Opens in new window
  • Opens in new window
  • Opens in new window
  • Opens in new window
© 2021 Emerald Publishing Limited

Services

  • Authors Opens in new window
  • Editors Opens in new window
  • Librarians Opens in new window
  • Researchers Opens in new window
  • Reviewers Opens in new window

About

  • About Emerald Opens in new window
  • Working for Emerald Opens in new window
  • Contact us Opens in new window
  • Publication sitemap

Policies and information

  • Privacy notice
  • Site policies
  • Modern Slavery Act Opens in new window
  • Chair of Trustees governance statement Opens in new window
  • COVID-19 policy Opens in new window
Manage cookies

We’re listening — tell us what you think

  • Something didn’t work…

    Report bugs here

  • All feedback is valuable

    Please share your general feedback

  • Member of Emerald Engage?

    You can join in the discussion by joining the community or logging in here.
    You can also find out more about Emerald Engage.

Join us on our journey

  • Platform update page

    Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

  • Questions & More Information

    Answers to the most commonly asked questions here