Evolutionary-learning framework: improving automatic swarm robotics design

Faqihza Mukhlish (School of Mechanical and Manufacturing Engineering, University of New South Wales, Sydney, Australia)
John Page (School of Mechanical and Manufacturing Engineering, University of New South Wales, Sydney, Australia)
Michael Bain (School of Computer Science and Engineering, University of New South Wales, Sydney, Australia)

International Journal of Intelligent Unmanned Systems

ISSN: 2049-6427

Publication date: 8 October 2018

Abstract

Purpose

The purpose of this paper is to review the current state of proceedings in the research area of automatic swarm design and discusses possible solutions to advance swarm robotics research.

Design/methodology/approach

First, this paper begins by reviewing the current state of proceedings in the field of automatic swarm design to provide a basic understanding of the field. This should lead to the identification of which issues need to be resolved in order to move forward swarm robotics research. Then, some possible solutions to the challenges are discussed to identify future directions and how the proposed idea of incorporating learning mechanism could benefit swarm robotics design. Lastly, a novel evolutionary-learning framework for swarms based on epigenetic function is proposed with a discussion of its merits and suggestions for future research directions.

Findings

The discussion shows that main challenge which is needed to be resolved is the presence of dynamic environment which is mainly caused by agent-to-agent and agent-to-environment interactions. A possible solution to tackle the challenge is by incorporating learning capability to the swarm to tackle dynamic environment.

Originality/value

This paper gives a new perspective on how to improve automatic swarm design in order to move forward swarm robotics research. Along with the discussion, this paper also proposes a novel framework to incorporate learning mechanism into evolutionary swarm using epigenetic function.

Keywords

Citation

Mukhlish, F., Page, J. and Bain, M. (2018), "Evolutionary-learning framework: improving automatic swarm robotics design", International Journal of Intelligent Unmanned Systems, Vol. 6 No. 4, pp. 197-215. https://doi.org/10.1108/IJIUS-06-2018-0016

Download as .RIS

Publisher

:

Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited


1. Introduction

Various strategies for programming intelligent robots to fulfil a mission in a harsh environment usually carried out by humans have been widely investigated. However, less research incorporating harsh environments has been carried out investigating a group of simple robots, namely, swarm robotics. Swarm robotics is currently earning a lot of attention because of its flexibility, scalability and robustness. Instead of using a complex robot which may not be cost effective due to the high possibility of failure in a harsh and unpredictable environment, swarm robotics has been explored in the past few years to investigate its ability to handle failure resulting in performance degradation in addition to issues such as the effect of an unpredictable environment. In recent years, inspired by collective behaviours of social animals, the idea of distributed low-level intelligence as a decentralised method to control a group of agents has been studied and developed (Brambilla et al., 2013; Bayindir, 2016). However, problems arise when a self-organising swarm tries to tackle tasks in a dynamic environment. Additionally, building a strategy to deal with such an environment is not a trivial task. Hence, an approach to decentralised control, strategy and cooperation between agents is necessary for the swarm to cope with environmental changes or an individual agent’s failure.

A common design strategy to achieve distributed low-level intelligence is using behaviour-based design inspired by social animals (ants, bees, birds, termites and fish). Since many behaviour-based models are specifically linked to specific tasks and behaviour, these models can be applied to swarm robotics quite easily (Bogue, 2008). A seminal work by Reynolds (1987) on a distributed behavioural model showed that complex behaviour can emerge from aggregation of individual simple actions. However, a behaviour-based model may solve only a specific problem. As the complexity of problem increases, behaviour-based design requires more effort in defining a mathematical model which is adaptable to an unpredictable environment. Hence, a more effective method of achieving the collective behaviour of a swarm is required. This problem can be easily resolved using an automatic design method to achieve collective behaviour without having to define the model of the problem explicitly.

One such promising swarm design concept that has been studied in our previous work (Chi et al., 2014) is the use of evolutionary algorithms (EAs) which can be considered as an automatic design method. However, the solution that resulted from incorporating EA to swarm robotics, called an evolutionary swarm, often gives unpredictable results due to random mutation and a changing environment (Yi et al., 2017). This shortcoming is due, in part, to the lack of consideration of the dynamics of the environment caused by agent-to-agent and agent-to-environment interactions. Hence, for an evolutionary swarm to operate in a dynamic environment, information on the stimulus from the environment must be collected and used later as “knowledge” for the next generation of the EA to formulate better actions or behaviours: Improving an action based on external stimulus is commonly defined as a learning mechanism (Mitchell, 1997). However, since EAs were mainly inspired by Darwinian evolution (Darwin, 1872), there is no direct mechanism for external knowledge perceived by individuals of one generation to be inherited by those of next generation (sometimes known as the Lamarckian paradigm). In other words, heritable learning does not play any significant role in the EA. Thus, an approach to incorporating a learning framework into an evolutionary swarm is required in order to widen the perspective on how to improve swarm capability in a dynamic environment.

To generate a comprehensive discussion, this paper begins by reviewing the current state of proceedings in the field of swarm robotics to provide a global perspective of swarm design and automatic design in general. The fundamentals of swarm design are discussed to provide a basic understanding in order to stimulate discussion with regard to foundation of swarm robotics. There are several recent articles available which review swarm robotics from different perspectives (see, Brambilla et al., 2013; Bayindir, 2016). This should lead to the identification of which issues need to be resolved in order to move forward swarm robotics research. Then, viable solutions are presented to identify future directions and how the proposed idea of incorporating learning mechanism could benefit swarm robotics design. In the last section of this paper, we propose a novel evolutionary-learning framework for swarms and explore how to promote their development.

2. Fundamental of swarm robotics design

Swarm robotics is a method of controlling a group of collaborative simple agents towards completing tasks. Swarm robotics has been defined as:

A novel approach to the coordination of large numbers of robots […] [and] a study of how large number of relatively simple physically embodied agents can be designed such that a desired collective behaviour emerges from the local interactions among the agents and environment.

(Şahin, 2005, p. 3)

The source of inspiration for swarm robotics comes from collective behaviour of social animals. When simple individuals cooperate in groups, they can perform complex behaviours as demonstrated by ants, bees, birds, termites and fish. Distributed behavioural models were first investigated in a seminal work by Reynolds (1987). The author showed that complex behaviour, in this case flocking behaviour, can emerge naturally from the aggregation of individual simple rules, namely collision avoidance, velocity matching and flock centring. Furthermore, many researches on social animals have shown that there is indeed a group intelligence that emerges from individual behaviour (Ame et al., 2006; Camazine, 2001; Couzin et al., 2005). The main characteristics necessary for self-organising swarm to be collectively functioning are the following:

  • a self-organised swarm is at least partially autonomous;

  • a self-organised swarm is situated in dynamic environment caused by agent-to-agent and agents-to-environment interaction;

  • a self-organised swarm is only able to sense on a local scale and interact with nearby agents within the group;

  • a swarm control is decentralised, and each agent does not have access to the swarm’s global behaviour;

  • all agents inside a self-organised swarm cooperate to tackle a given task collaboratively; and

  • a self-organised swarm can access and communicate the knowledge gained such as data monitoring to a base.

Developments of collective behaviour of self-organised swarm robotics have been studied extensively in tackling a given task, applying decentralised control and adapting them to survive in a particular environment (Bayindir, 2016). In the past decade, methods to design a collective behaviour of swarm robotics can be grouped into two approaches: behaviour-based and automatic design. The latter is the way of designing swarm strategy automatically without explicitly modelling the problem and behaviour of the swarm. The benefits of this method are flexibility, scalability and robustness. On the other hand, behaviour-based design is using predefined mathematical models or strategies to direct the swarm behaviour, usually known as top-down design. Collective behaviour is decomposed into individual behaviours. It is, in fact, more predictable than applying automatic design since particular behaviours are already known (Bogue, 2008). However, behaviour-based design may solve only a specific problem in a fixed environment. In general, automatic design is more flexible than behaviour-based design and it provides the swarm with the agility to cope with dynamic problems.

2.1 Behaviour-based design

Developing a swarm robotics system using behaviour-based design has often been inspired by the observation of the collective behaviours of social animals. The design process is comprised a cycle of modelling the behaviour, implementation on swarm system, evaluation and improvement of the current model. Through this process, many swarm behaviour-based designs have been proposed in recent decade. For example, an object retrieval task using a group of robots was accomplished (Labella et al. 2006) by applying behaviour of foraging and the division of labour of ants (Deneubourg et al., 1987). Adopting behavioural models based on social animals may ease the swarm design process, since the mathematical model can be derived easily. Moreover, there are two forms of modelling paradigms that commonly used, namely, probability finite state machine (PFSM) and virtual physics-based approach.

2.1.1 Probability finite state machine

PFSM was first coined by Minsky (1967) and the main idea of PFSM is the transition between states depends on a probability value. In swarm design, PFSM is often used as representation of class of behaviours and transition between behaviours. Implementation of PFSM in swarm design can be seen in several developments of collective behaviours, such as aggregation (Soysal and Şahin, 2005; Correll and Martinoli, 2011; Garnier et al., 2009), chain formation (Nouyan et al., 2008) and task allocation (Labella et al., 2004, 2006; Liu et al., 2007).

The transition threshold between states can be either constant or can vary. A finite state machine that uses constant threshold value can be found in the aggregation model studied by Soysal and Şahin (2005). Specifically, the agent changes behaviour to another behaviour dependent on the probability value associated between these two behaviours as depicted in Figure 1. On the other hand, the dynamic transition threshold value is defined through a mathematical function of parameters associated to available states of the agent as shown in Figure 2. In study conducted by Garnier et al. (2009), the transition function depends on number of robots and maximum capacity in the current state.

2.1.2 Virtual physics-based design

In virtual physics-based design, each robot is considered as particle that applies virtual potential forces to another robot. Usually, the virtual force is a function of position, direction and distance between the entities (robots, targets and obstacles). More specifically, repulsion and attraction values are also used to differentiate force’s direction. The function of virtual force generally allows each robot to sense and distinguish neighbouring robots, targets and obstacles.

Maintaining formation of the swarm, whilst traversing a terrain, is a common collective behaviour that utilise virtual physics-based design. One of the seminal works using virtual physics-based design was by Khatib (1986) which considered another vehicle as a moving obstacle in a field of artificial forces. Similar works also can be found in the works of Reynolds (1987) which formulate a general flocking behavioural model comprises of collision avoidance, velocity matching and flock centring using attraction and repulsion forces. Later, the design methods considered as virtual physics-based design also can be found in several frameworks, such as physicomimetics frameworks proposed by Spears et al. (2004) and a nonlocal kinetic model proposed by Fetecau (2011).

In a very specific behaviour such as flocking and foraging which require medium to share knowledge and information, stigmergic approach inspired by how social animal share their information is commonly used in swarm robotics design. In social animal such as ants, they share information using a trail of pheromones allowing them to create paths to the food sources. Using this approach along with two models above, several studies accomplished to mimic animals’ behaviours such as foraging (Labella et al., 2006; Ranjbar-Sahraei et al., 2012).

Because of their generality, these two models are often used as a foundation model and later combined with optimisation or search algorithm to automatically tune model’s parameters.

3. Automatic swarm robotics design

Automatic swarm design is aimed at achieving collective behaviour without any reference to predefined explicit model. Two automatic design methods commonly used to tackle swarm robotics problems are evolutionary computation and reinforcement learning (RL). How swarm robotics design utilises these two methods will be discussed in following subsections. The principles of both methods are discussed and current challenges of applying each method is investigated.

3.1 Evolutionary swarm robotics

In robotics, evolutionary computation (Goldberg, 1989) can be used to encode characteristics of control strategies into artificial chromosomes (Holland, 1992). Each chromosome represents a particular characteristic of each strategy and its fitness value (performance) is evaluated using fitness functions. Chromosomes (a set of strategies) with high fitness value are allowed to breed through genetic operators such as recombination, random mutation and selection. Progenies with higher fitness ratings (new strategies) will replace those with least fitness rated strategies in the previous generation. This development is repeated until the fitness value of the new generation meets the designated criterion. This method of generating control strategy in robotics is defined as evolutionary robotics (ER) as depicted in Figure 3 (Nolfi et al., 2016).

Evolutionary computation has been proven to be successful in solving single-agent problems (Eiben and Smith, 2015). In multi-agent system (MAS) such as swarm robotics, evolutionary computing is also beneficial to the swarm in generating collective behaviour despite the nonlinearities caused by the number of agents and the given tasks. For example, many studies show that evolutionary computation can be used to acquire collective behaviour such as foraging (Francesca et al., 2012; Gauci, Chen, Dodd and Groß, 2014; Trianni et al., 2003), flocking (Baldassarre et al., 2003), path formation (Kuyucu et al., 2012), clustering (Gauci, Chen, Li, Dodd and Gross, 2014; Hartmann, 2005), collective object transport (Groß and Dorigo, 2004, 2009) and task allocations (Tuci et al., 2008).

3.2 Multi-agent reinforcement learning (MARL)

MARL is the research field that studies RL techniques on MAS and studies the design of algorithms to create such adaptive agents. MARL, like ER, it has been used to find a solution undefined beforehand which has behaviour considered optimal within a situation. Related to swarm robotics, a particular learning mechanism that is commonly used is RL (Brambilla et al., 2013; Busoniu et al., 2008). RL on MAS allows the individual to learn a behaviour through trial-and-error interactions with environment and other agents (Kaelbling et al., 1996; Sen and Weiss, 1999; Sutton and Barto, 1998), see Figure 4. In each interaction, the action which is taken by each agent in a corresponding state of the environment is rewarded based on its performance. Using the sum of rewards, action and state are paired into a set of action-states as a rule of swarm behave in a certain state of an environment.

RL in swarm robotics takes the advantages of multi-agent’s settings. First, multi-agents setting allows the learning process to be computed in parallel way. Second, knowledge perceived by an agent from trial-and-error interaction with the environment can be shared with other agents with similar tasks. Teaching and imitation are examples of sharing medium within the group. Lastly, redundancy also can be applied when one or more agents fail in accomplishing tasks. When the failure occurs, the remaining agents can take over and complete the tasks. Additionally, RL has been acknowledged to perform quite well in MAS such as a swarm (Busoniu et al., 2008). Several learning methods considered as MARL have been implemented into, for example, swarm of odour localisation robots (Hayes et al., 2003), stick pulling robots (Li, Martinoli et al., 2004) and RoboCup Soccer (Kalyanakrishnan and Stone, 2007; Riedmiller et al., 2009; Stone et al., 2005). All in all, summary of the current approaches in automatic swarm robotics design can be seen in Figure 5 below.

3.3 Challenges

Despite of the advantages of evolutionary computation and RL in automatic swarm design, there are several challenges in utilising automatic design to achieve collective behaviour in a swarm.

3.3.1 Deception

In solving a problem, both evolutionary computation and machine learning commonly utilise fitness function (also known as objective function). However, the computation process is often misguided by the fitness function and converge on local optima which is deceiving. The fitness function acts as natural pressure for the computation process to improve its performance (Holland, 1992). This function is defined beforehand and derived based on the tasks to be solved which is challenging as complexity increases. Hence, in a dynamic environment, the computation is prone to deceitful solution. Furthermore, Lehman and Stanley (2011) concluded that performance-based fitness function can be misleading, especially for dynamic and complex problems.

3.3.2 Exploration and exploitation dilemma

Another problem arises when a fitness function is being used. The swarm will exploit an action that maximises their performance and easily dismiss any inept behaviour. This problem is tightly related to the deception problem discussed above. To counter that, exploration techniques is used to supress the exploitation. Exploration is beneficial to seeking an alternative behaviour which may be lead to better performance in the future. However, adding exploration technique is not without problems. The collective behaviour of the swarm will be unstable if too much exploration is applied. Here is where the problem of the exploration and exploitation dilemma arises. This problem is even more complicated in multi-agent learning system because of the presence of co-learning.

3.3.3 Non-stationary behaviour

Automatic design allows each agent to evolve and learn simultaneously. However, the non-stationary behaviours of the multi-agent learning problem arises because of all agents in the system are learning simultaneously. This generates a situation in which each agent is trying to develop its own best behaviour as the other agents’ best behaviour changes. Each agent is, therefore, faced with a moving-target learning problem. This causes the computation process of achieving collective behaviour automatically to become more challenging. The main impact of non-stationary problem in co-learning setting is the risk of unstable behaviours.

3.3.4 The curse of dimensionality

MASs that utilise learning mechanism such as RL, which maps the state into best behaviour suffers from the “curse of dimensionality”. The knowledge of possible a discrete state of the environment which is resulted from MARL grows exponentially as the number of agents increases. As the estimation of possible discrete state or state-action pair becomes bigger, the computation process to choose the best behaviour or policy for the current state becomes more complex over time and requires extra computing time.

4. Improving automatic-based design

4.1 Maintaining diversity

Fitness or objective function is a tool to measure how good current behaviour is in the current situation. Evolutionary computation and learning mechanism are used to find better behaviour that maximise the fitness value. Behaviours with the least good performance will be removed although these maybe beneficial in determining true direction to global optima. This objective-based search paradigm does not necessarily measure how good the intermediate decision is in finding a direction that leads to the objective. Hence, several options to overcome deception will be discussed in following section.

4.1.1 Sustaining behavioural diversity

In fitness-based computation, all available strategies in system’s repertoire converge asymptotically towards a set of behaviours that may be maximising the fitness. The result typically is within a certain range of values and is applied to all agents. This is an issue, because of alternative solutions, which may be leading to a better behaviour, maybe dismissed in the computation process. This problem occurred because of the computation had a preference for exploitation rather than exploration. Hence, the diversity of the solutions should be maintained giving alternative direction for the swarm to explore better behaviours.

Sustaining behavioural diversity is one of many ways to improve the efficiency of fitness-based evolutionary computation in empirical exploration (Goldberg, 1989; Mahfoud, 1997; Sareni and Krahenbuhl, 1998) and also analytically (Friedrich et al., 2008). The diverseness can be achieved by defining a distance between behaviours maintained in the computation process. Hamming distance is a tool that is typically used to quantify the diverseness. Then, neighbouring behaviours will be clustered into a behaviour-set in which the evolutionary computation is carried out. In multi-agent problem, such as swarm, this seems natural method to generate a potential improvement in behaviour exploration and in overcoming the deception.

4.1.2 Novelty search

Other than maintaining the diversity of agent’s behaviour which is an extended version of fitness-based computation, there is another method to overcome deception known as novelty search suggested by Lehman and Stanley (2011). This approach quantifies the novelty of new behaviour achieved in the exploration process using a novelty metric. The metric measure how far away (novel) the new behaviour is from the rest of the available behaviours. Then, all unique or novel behaviours are placed in the behavioural space along with the novelty value. The value of novel behaviours is characterised by the sparseness of each behaviour at any point in the behavioural space. Using the sparseness value in the behavioural space, each behaviour achieved in each step of computation is rewarded a value based on its novelty. The sparseness value ρ is an average distance to the k -nearest neighbours μ at a point x as can be seen below:

(1) ρ ( x ) = 1 k i = 0 k d i s t ( x , μ i ) .

Using novelty search, the exploration is expected to be moving towards new behaviours and avoiding deception. A new behaviour is considered as a novel behaviour if its located in the sparser regions and then received more novelty rewards. On the other hand, a less novel behaviour will be rewarded less than a novel one and grouped into a group of similar behaviour. The novelty metric allows the computation process to identify where the exploration has occurred and where it currently exploring. By maximising the novelty metric, the search direction is simply moving towards what is new. Thus, because no objective function is involved, novelty search can be used to overcome deception.

4.2 Balancing exploration-exploitation

Through exploration, automatic-based design discovers new behaviours in respective states. Then, the best behaviour is selected through exploitation which maximise agent’s performance. Exploration and exploitation in a learning system should be in balance. Taking too much exploration will not lead the system to stability and may use resources to explore unnecessary space which not beneficial for the task. On the other hand, exploiting current best behaviour can also lead to deception as discussed in previous section. In fact, using the same principle as overcoming deception, the exploitation problem can be solved by maintaining diversity and alternating emphasis between exploration and exploitation based on a certain function or probability threshold.

Particular to exploration–exploitation dilemma in multi-agent learning, one way to proceed is by utilising an ϵ-greedy exploration method. In this method, most of the time, with probability of (1−ϵ), the algorithm exploits current best behaviour but once in a while explores randomly to another behaviour with a small probability ϵ. Another alternative is by using Boltzmann distribution which is a “SoftMax” approach. Exploitation of current best behaviour and exploration of alternative behaviour is based on a parameter τ to balance exploration and exploitation. The probability Pi of a chosen behaviour μi among available behaviours (μ1, μ2, μ3, …, μj) in a state S denoted as:

(2) P i = e f ( S , μ i ) / τ j ( e f ( S , μ i ) / τ ) .

4.3 Achieving Nash equilibrium

The non-stationary behaviours of the multi-agent learning problem arises because of all agents in the system are learning simultaneously. Each agent is, therefore, faced with a moving-target learning problem: the best behaviour changes as the other agents’ behaviour change. Moreover, in fully cooperative tasks, a non-stationary behaviour is the main problem in achieving stability and adaptation of the learning system. To counter this, convergence to a stationary behaviour is needed because it reduces the non-stationary behaviours problem.

Achieving a stable behaviour in a co-learning setting is not a trivial task, especially if each agent is oblivious to other’s behaviour. However, even if each agent has a knowledge of others behaviour, it is still difficult to select best behaviour given other agent’s behaviour. To counter this condition, a set of learned behaviours has to follow a Nash equilibrium criterion (Nash, 1950) which is a condition where a best behaviour given other agent’s behaviours is exist in each agent’s behaviours repertoire. (Busoniu et al., 2008; Tuyls and Weiss, 2012). However, achieving the Nash equilibrium in a co-learning process is not without a problem, because each learning process is dynamic and which learning direction is taken by an agent is difficult to observe or invisible to others.

To meet the Nash criteria in a co-learning setting, Bowling and Veloso (2001) proposed that there are two properties of multi-agent learning algorithm have to be followed which are:

  1. Property 1. (Rationality) if other agents’ behaviours converge to stationary behaviours then the learning algorithm will converge to a behaviour that is a best-behaviour to their behaviours.

  2. Property 2. (Convergence) The learner will necessarily converge to a stationary behaviour. This property will usually be conditioned on the other agents using an algorithm from some class of learning algorithms.

The first property requires the agent to learn the best behaviour when other agents are applying stationary behaviours. This will restrict a learning agent to learn some behaviour that is independent to other agent’s behaviour. Moreover, this property will give an ability for an agent to learn rationally. The latter property can be achieved only if an agent is learning with respect to rational agents or is learning with respect to agents with stationary behaviours.

In combination, these two properties guarantee that the learner will converge to a stationary behaviour given the behaviour of the other agents. There is also a connection between these two properties with Nash equilibrium. When all agents are rational and convergence, then a Nash equilibrium exists since all the behaviour learned by each agent is the best response to other agent’s behaviours.

4.4 Adaptation

Adaptation to a dynamic environment is a major challenge in automatic swarm design. As discussed above, swarm design using evolutionary computation is not inherently adapting to the condition of the environment but rather follow the direction of the fitness function. On the other hand, RL which is a common mechanism used in MASs also has many challenges in mapping behaviours to states of a dynamic environment. Despite improving both methods separately, combining evolutionary computation and learning mechanism into one framework using Lamarckian principle has shown a significant increase in performance in a dynamic environment (Giraud-Carrier, 2002; Le et al., 2009). Extended combination of evolutionary computation and learning will be discussed in following section as a new perspective for designing a swarm automatically.

5. Evolutionary-learning framework

While the evolutionary computation is commonly used to define a swarm behaviour automatically as reviewed in a previous section, multi-agent learning methods are found to have flexibility to a dynamic environment and other agents’ behaviour. Combination of these two methods would be beneficial to incorporating heuristic search and adaptation into system control (Mukhlish et al., 2018).

Evolutionary computation is primarily encouraged by Darwinian evolution (Goldberg, 1989; Holland, 1992). However, this theory dose not utilise parameters from the environment in shaping an individual’s behaviour nor passing it on to the next generation. Beside this evolution theory, there is an evolution theory proposed by Lamarck which is beneficial in taking parameters from environment that will be used later to improve the next progenies. Specifically, the generation’s performance is enhanced using parental experience of nutrition, contaminants, nurturing behaviours and social stress or fear (Lim and Brunet, 2013). This principle is analogous to a learning mechanism such as RL. Moreover, studies conducted by Waddington in 1984 (Waddington, 2012) showed that there is a layer above a gene, called Epigenetic in Biology, that regulate its expression dependant to external stimulus from environment. The stimulus and experiences are somehow stored in a form of Epigenetic layer which is acting as a regulatory function and is inherited by the next generation (Wang et al., 2017).

5.1 Epigenetics algorithm

Epigenetics layer is regarded as a tool for an agent to respond to environmental stimulus by modifying its phenotypic expression. This means that an agent has some type of regulatory structures that receive an input from the environment (external stimulus) and then use it to regulate genotypes as a form of expression regulation. Moreover, based on works in Biology, these regulatory structures and their function are inherited between generations (Wang et al., 2017). Inspired by these mechanisms, an algorithm which uses such principle was proposed, known as epigenetic algorithm (EpiAL), by Sousa and Costa (2011). In their proposed model, interaction between the agent-environment is depicted in Figure 6.

EpiAL model is composed by two fundamental entities: agent and environment. External stimulus from environment is received by the agent through its sensors. The stimulus is then passed onto epigenetic layer which is acting as regulatory structures. After that, appropriate genetic codes are selected and regulated given the received stimulus. Then, the selected genetic codes are expressed modifying current behaviour of the agent. Each cycle of EpiAL algorithm, the performance of selected behaviour is measured to calculate the relation between stimulus and the genetic codes a methylation value. This methylation value is used to evaluate weights of the epigenetic algorithm.

Using this model, the authors were able to represent the regulatory function of epigenetic layer into a mathematical model to respond to a dynamic stimulus from environment. Similar works in other studies also demonstrate that the utilisation of regulatory structures is a key to handling a dynamic environment (La Cava and Spector, 2015; Tanev and Yuta, 2008; Periyasamy et al., 2008).

5.2 Behavioural space and environment state

The epigenetic layer translates the environment state to behavioural space which contains of expressions composed of selected genes by epigenetic layer (see Figure 7). Additionally, this is analogous to function of epigenetic which has been studied in epigenome research area as a product of adaptation to environment. This perspective leads to the key idea on regulatory function of epigenetic layer.

5.3 Regulatory function

The regulatory function of epigenetic layer can be derived from a knowledge of external stimulus from environment. One common method to obtain the function is through trial-and-error interaction with the environment. This interaction is beneficial to get temporal and spatial knowledge of the dynamic environment. Selected behaviour in each interaction phase is evaluated and will be rewarded based on its performance in the current environmental state. The received rewards are then used to form a relation between individual’s behaviours and environmental states. This mechanism results in a regulatory function which will select a set of genetic codes forming a behaviour that gives a maximum reward given an environmental state. The model of this mechanism is depicted in Figure 8.

Moreover, the received reward is analogous to methylation process in Biology. In other words, behaviour is regulated based on the methyl value of epigenetic layer. A gene which is covered with high methylation value given the external stimulus is more likely to be expressed. On the other hand, a gene with low methylation value will likely to be silenced. Furthermore, rewarding system in learning machine can be utilised to methylate (strengthen) or demethylate (weaken) the methyl value. These methylation marks are inherited by next generation through evolutionary operator, namely selection, recombination, mutation and regeneration.

5.4 Sensory component – state observer

However, in swarm robotics, knowledge of the external stimulus from environment can only be measured using available sensors on board, which is of course limited. Generally, utilised sensors can only quantify simple specific units such as temperature. Knowledge of the dynamic of the environment is thus necessary to select appropriate behaviour although only a limited sensory input is usually available. Thus, to get the quality of the environment a combination of sensors’ values is necessary to recognise the current state. However, since the dynamics of the environment is known as deterministically chaotic which was summarised by Edward Lorenz (1963) as “Chaos: when the present determines the future, but the approximate present does not approximately determine the future”, taking a priori model of observer function which maps sensors’ values to environmental state is difficult. It is even more difficult to craft an appropriate function if the complexity of the environment increases. To open the possibility in automatic manner, the conceptual of epigenetic layer should be improved by incorporating a learning mechanism, which is found useful to predict and observe chaotic systems (Pathak et al., 2017), to determine the dynamics of the environment. This can be done by embedding a state observer into epigenetic layer. Then, to be applicable to automatic design, the state recognition is also obtained from trial-and-error interaction with environment as shown in Figure 9. This complete framework will be investigated in our future research.

The utilisation of epigenetic layer gives ability to apply knowledge of an environment and related behaviours into the swarm strategy. The combination of selected genes represents the phenotype. Distance between behaviours can be calculated using metric as given by Equation (1). This metric is beneficial to limit duplication of similar regulatory function in epigenetic layer and prevent the agent relearning similar behaviours.

5.5 Collective learning and evolution

Evolutionary computation provides medium of information sharing between agents through selection, recombination, mutation and regeneration. Along with corresponding genetics value, epigenetics is also inherited to next generation through evolutionary process. Collective learning occurs in this process through recombination. All behaviours with similar value are clustered into groups based on novelty metric discussed above. And then, behaviours in the same group will be evolved together and a new behaviour that constructing significantly different value (novel) is then be inserted into a behaviour to be explored. This method acts as selective pressure for behaviour-set in the behavioural space to overcome the deceptive issue. Another challenge to be resolved is exploration–exploitation dilemma. To balance exploration and exploitation, an ϵ -greedy approach can be utilised as an option. By combining exploitation–exploitation balancing with the novelty metric, the exploration is expected to move towards new behaviours.

6. Future work

This approach offers promising way to improve swarm capability using an automatic design approach, especially extending evolutionary swarms and learning. Since learning capability is the main key for a swarm to tackle dynamic environment, it is important to investigate learning mechanisms for swarms (Mukhlish et al., 2018). Once the learning mechanism is identified, future implementation of swarm learning could be explored in swarm robotics research. In order to move forward in this direction, several steps need to be achieved.

Evolutionary-learning framework formulation: incorporating a learning mechanism into evolutionary computation can be achieved by extending EpiAL model by Sousa and Costa (2011). The main goal of this framework will be to respond to stimuli from the environment based on experience and inherit the behaviour through an evolutionary process. The response is chosen from among available responses (actions) based on the performance at the current state. The response will be built based on interaction with environment as a learning process. Then, these behaviours will be inherited to the next generation through an evolutionary process. Specifically, this framework will be applied and analysed on an individual agent.

Evolutionary swarm learning framework formulation: Once the evolutionary-learning framework on an individual agent is achieved, a parallel framework will be formulated. The method of adapting mechanisms of single learning entities into multi-agent learning will be done using collaborative and competitive learning. Each agent will learn competitively as an individual to achieve the best response to collaboratively improve swarm adaptation to the environment.

Validate the proposed framework: when the required framework and algorithms are mathematically proven and implemented, it will be possible to extensively simulate the framework in order to measure its performance. The framework has to be evaluated in a dynamic environment testbed. One way to achieve dynamic environment is using a simulation (Page and Mukhlish, 2017). The simulation will be designed as an unknown environment and evaluate the capability of the swarm to survive and thrive.

The potential case that will be used in the future work is search and rescue mission on the ocean using swarm of unmanned aerial vehicles (UAVs) which has been studied in our previous works (Chi, 2014; de Crespigny, 2015; Wright, 2015). Search and rescue mission on the ocean is known to be very challenging and require number of complex actions that require material and human resources (Vidan et al., 2016). However, it is an essentially two dimensions problem that also avoids blanking that occurs in land-based search and rescue. The dynamic of environmental setup in the simulation would be dependent on wind direction, sun position, sea state and drift of the victims. Sensory components available on the UAVs would be a camera to capture sea surface and detect floating victims and proximity sensor to detect nearby UAVs. To sense the dynamics of the environment, for example, the image quality captured by the camera depends on environmental properties such as the sun reflection off the sea surface. All these setups are considerably adequate to provide environmental dynamics for validating the evolutionary-learning framework as discussed by Helleboogh et al. (2006). Moreover, the authors stated that simulated dynamic environment should be changing its dynamics in ways beyond the agent’s actions. Dynamism in which each agent experiences in a dynamic environment depends on other autonomous agent’s acts and changing environmental properties over time.

7. Conclusion

Swarm robotics is currently earning a lot of attentions because of its flexibility, scalability and robustness. Both approaches to design, namely, behaviour-based and automatic design, of designing swarm robotics are discussed in this paper. Challenges related to current developments in both design methods and directions that could be taken for further development and improvement are also discussed. Development of behaviour-based and automatic design is necessary to provide options for designing a swarm robotics behaviour. Especially with behaviour-based design, further development should strengthen the foundation model of swarm behaviour and provides a useful validation platform for social animal behavioural model for biological studies.

At the time of this paper was written, the development of automatic design is facing several challenges that are inherently related to computation problem such as deception, exploration–exploitation dilemma, non-stationary behaviour and the curse of dimensionality. All these challenges are mainly caused by the multi-agents rule setting and the dynamic environment. Although automatic swarm robotics designs either using evolutionary or learning mechanisms have shown promising results in many researches reports, research into tackling challenges stated above are required for future implementation of swarm robotics in real applications within dynamic environment.

In general, building a strategy to achieve an advanced self-organising swarm in a dynamic environment is not a trivial task because of its complexity. Although behaviour-based methods inspired by animal behaviour are mature enough, these may solve only a specific problem with a stationary environment. This has created a necessity for automatic-based methods to generate a more flexible and robust swarm design. As an advancement in automatic design methods, incorporating learning mechanism into an evolutionary swarm is a possible method to insert a dynamic of environment into design process.

In this paper, a novel evolutionary swarm learning method using epigenetic inheritance is proposed and a possible learning mechanism is also identified as a response to a dynamic environment. Simulation, consisting of agent-based modelling and dynamic environments, should be utilised to validate and analyse the performance of the proposed framework initially. Finally, as result, the research is expected to contribute to automatic-design methods for swarm robotics by adding learning capability and environment-aware behaviour.

Figures

PFSM with constant probability transition value

Figure 1

PFSM with constant probability transition value

PFSM with dynamic probability transition value

Figure 2

PFSM with dynamic probability transition value

Evolutionary Swarm Robotics

Figure 3

Evolutionary Swarm Robotics

Multi-agent reinforcement learning

Figure 4

Multi-agent reinforcement learning

Classification of swarm robotics design

Figure 5

Classification of swarm robotics design

EpiAL conceptual model

Figure 6

EpiAL conceptual model

Epigenetic layer maps environment’s states to behaviours

Figure 7

Epigenetic layer maps environment’s states to behaviours

Learning mechanism for epigenetic layer

Figure 8

Learning mechanism for epigenetic layer

Cascade learning for epigenetic layer

Figure 9

Cascade learning for epigenetic layer

References

Ame, J.M., Halloy, J., Rivault, C., Detrain, C. and Deneubourg, J.L. (2006), “Collegial decision making based on social amplification leads to optimal group formation”, Proceedings of the National Academy of Sciences, Vol. 103 No. 15, pp. 5835-5840, available at: https://doi.org/10.1073/pnas.0507877103

Baldassarre, G., Nolfi, S. and Parisi, D. (2003), “Evolving mobile robots able to display collective behaviors”, Artificial Life, Vol. 9 No. 3, pp. 255-267, available at: https://doi.org/10.1162/106454603322392460

Bayindir, L. (2016), “A review of swarm robotics tasks”, Neurocomputing, Vol. 172, January, pp. 292-321, available at: https://doi.org/10.1016/j.neucom.2015.05.116

Bogue, R. (2008), “Swarm intelligence and robotics”, Industrial Robot: An International Journal, Vol. 35 No. 6, pp. 488-495, available at: https://doi.org/10.1108/01439910810909475

Bowling, M.l and Veloso, M. (2001), “Rational and convergent learning in stochastic games”, Proceedings of the 17th International Joint Conference on Artificial Intelligence, Vol. 2, Morgan Kaufmann Publishers, San Francisco, CA, pp. 1021-1026, available at: http://dl.acm.org/citation.cfm?id=1642194.1642231

Brambilla, M., Ferrante, E., Birattari, M. and Dorigo, M. (2013), “Swarm robotics: a review from the swarm engineering perspective”, Swarm Intelligence, Vol. 7 No. 1, pp. 1-41, available at: https://doi.org/10.1007/s11721-012-0075-2

Busoniu, L., Babuska, R. and De Schutter, B. (2008), “A comprehensive survey of multiagent reinforcement learning”, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 38 No. 2, pp. 156-172, available at: https://doi.org/10.1109/TSMCC.2007.913919

Camazine, S. (Ed.) (2001), Self-Organization in Biological Systems. Princeton Studies in Complexity, Princeton University Press, Princeton, NJ.

Chi, T.-C. (2014), “Evolutionary adaptation and optimisation in heterogeneous and homogeneous aerial search vehicles”, Master of Engineering, School of Mechanical and Manufacturing, Faculty of Engineering, The University of New South Wales, Techno-Press, Yuseong, Daejeon.

Chi, T.Z., Hayong Cheng, J.R.P. and Ahmed, N.A. (2014), “Evolving Swarm of UAVs”, Advances in Aircraft and Spacecraft Science, Vol. 1 No. 2, pp. 219-232, available at: https://doi.org/10.12989/aas.2014.1.2.219

Correll, N. and Martinoli, A. (2011), “Modeling and designing self-organized aggregation in a swarm of miniature robots”, The International Journal of Robotics Research, Vol. 30 No. 5, pp. 615-626, available at: https://doi.org/10.1177/0278364911403017

Couzin, I.D., Krause, J., Franks, N.R. and Levin, S.A. (2005), “Effective leadership and decision-making in animal groups on the move”, Nature, Vol. 433 No. 7025, pp. 513-516, available at: https://doi.org/10.1038/nature03236

Crespigny, A.C. de. (2015), “Swarm learning techniques”, Bachelor of Engineering, Kensington, School of Mechanical and Manufacturing, Faculty of Engineering, The University of New South Wales, Sydney.

Darwin, C.R. (1872), “Natural selection”, in John, M. (Ed.), The Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life, Chapter 4, 6th ed., Eleventh thousand, London, pp. 77-119.

Deneubourg, J.-L., Goss, S., Pasteels, J.M., Fresneau, D. and Lachaud, J.-P. (1987), “Self-organization mechanisms in ant societies (II): learning in foraging and division of labor”, in Pasteels, J.M. and Deneubourg, J.-L. (Eds), From Individual to Collective Behavior in Social Insects Experientia Supplementum, Vol. 54, Birkhäuser, Basel, pp. 177-196.

Eiben, A.E. and Smith, J.E. (2015), Introduction to Evolutionary Computing. Natural Computing Series, Springer, Berlin and Heidelberg, available at: http://link.springer.com/10.1007/978-3-662-44874-8

Fetecau, R.C. (2011), “Collective behavior of biological aggregations in two dimensions: a nonlocal kinetic model”, Mathematical Models and Methods in Applied Sciences, Vol. 21 No. 7, pp. 1539-1569, available at: https://doi.org/10.1142/S0218202511005489

Francesca, G., Brambilla, M., Trianni, V., Dorigo, M. and Birattari, M. (2012), “Analysing an evolved robotic behaviour using a biological model of collegial decision making”, in Ziemke, T., Balkenius, C. and Hallam, J. (Eds), From Animals to Animats 12, Vol. 7426, Springer, Berlin and Heidelberg, pp. 381-390, available at: http://link.springer.com/10.1007/978-3-642-33093-3_38

Friedrich, T., Pietro, S.O., Sudholt, D. and Witt, C. (2008), “Theoretical analysis of diversity mechanisms for global exploration”, GECCO ’08 Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, ACM Press, New York, NY, pp. 945-952, available at: https://doi.org/10.1145/1389095.1389276

Garnier, S., Gautrais, J., Asadpour, M., Jost, C. and Theraulaz, G. (2009), “Self-organized aggregation triggers collective decision making in a group of cockroach-like robots”, Adaptive Behavior, Vol. 17 No. 2, pp. 109-133, available at: https://doi.org/10.1177/1059712309103430

Gauci, M., Chen, J., Dodd, T.J. and Groß, R. (2014), “Evolving aggregation behaviors in multi-robot systems with binary sensors”, in Ani Hsieh, M. and Chirikjian, G. (Eds), Distributed Autonomous Robotic Systems. Springer Tracts in Advanced Robotics, Springer, Berlin and Heidelberg, pp. 355-367, available at: https://link.springer.com/chapter/10.1007/978-3-642-55146-8_25

Gauci, M., Chen, J., Li, W., Dodd, T.J. and Gross, R. (2014), “Clustering objects with robots that do not compute”, Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems. AAMAS ’14: International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, pp. 421-428, available at: http://dl.acm.org/citation.cfm?id=2615731.2615800

Giraud-Carrier, C. (2002), “Unifying learning with evolution through Baldwinian evolution and lamarckism”, in Zimmermann, H.J., Tselentis, G., van Someren, M. and Dounias, G. (Eds), Advances in Computational Intelligence and Learning. International Series in Intelligent Technologies, Springer, Dordrecht, pp. 159-168, available at: https://link.springer.com/chapter/10.1007/978-94-010-0324-7_11

Goldberg, D.E. (1989), Genetic Algorithms in Search, Optimization, and Machine Learning, Mass: Addison-Wesley Pub, Reading.

Groß, R. and Dorigo, M. (2004), “Cooperative transport of objects of different shapes and sizes”, in Dorigo, M., Birattari, M., Blum, C., Gambardella, L.M., Mondada, F. and Stützle, T. (Eds), Ant Colony Optimization and Swarm Intelligence. Lecture Notes in Computer Science, Springer, Berlin and Heidelberg, pp. 106-117, available at: https://link.springer.com/chapter/10.1007/978-3-540-28646-2_10

Groß, R. and Dorigo, M. (2009), “Towards group transport by swarms of robots”, International Journal of Bio-Inspired Computation, Vol. 1 Nos 1/2, p. 1, available at: https://doi.org/10.1504/IJBIC.2009.022770

Hartmann, V. (2005), “Evolving agent swarms for clustering and sorting”, GECCO ’05 Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, ACM Press, New York, NY, pp. 217-224, available at: https://doi.org/10.1145/1068009.1068042

Hayes, A.T., Martinoli, A. and Goodman, R.M. (2003), “Swarm robotic odor localization: off-line optimization and validation with real robots”, Robotica, Vol. 21 No. 4, pp. 427-441, available at: https://doi.org/10.1017/S0263574703004946

Helleboogh, A., Vizzari, G., Uhrmacher, A. and Michel, F. (2006), “Modeling dynamic environments in multi-agent simulation”, Autonomous Agents and Multi-Agent Systems, Vol. 14 No. 1, pp. 87-116, available at: https://doi.org/10.1007/s10458-006-0014-y

Holland, J.H. (1992), Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, 1st MIT Press ed., Complex Adaptive Systems: MIT Press, Cambridge, MA.

Kaelbling, L.P., Littman, M.L. and Moore, A.W. (1996), “Reinforcement learning: a survey”, Journal of Artificial Intelligence Research, Vol. 4 No. 1, pp. 237-285.

Kalyanakrishnan, S. and Stone, P. (2007), Batch Reinforcement Learning in a Complex Domain, ACM Press, New York, NY, pp. 662-669, available at: https://doi.org/10.1145/1329125.1329241

Khatib, O. (1986), “Real-time obstacle avoidance for manipulators and mobile robots”, The International Journal of Robotics Research, Vol. 5 No. 1, pp. 90-98, available at: https://doi.org/10.1177/027836498600500106

Kuyucu, T., Tanev, I. and Shimohara, K. (2012), “Evolutionary optimization of pheromone-based stigmergic communication”, in Di Chio, C., Agapitos, A., Cagnoni, S., Cotta, C., de Vega, F.F., Di Caro, G.A., Drechsler, R., Ekárt, A., Esparcia-Alcázar, A.I., Farooq, M., Langdon, W.B., Merelo-Guervós, J.J., Preuss, M., Richter, H., Silva, S., Simões, A., Squillero, G., Tarantino, E., Tettamanzi, A.G.B., Togelius, J., Urquhart, N., Şima Uyar, A. and Yannakakis, G.N. (Eds), Lecture Notes in Computer Science, Springer, Berlin and Heidelberg, pp. 63-72, available at: https://link.springer.com/chapter/10.1007/978-3-642-29178-4_7

La Cava, W. and Spector, L. (2015), “Inheritable epigenetics in genetic programming”, in Riolo, R., Worzel, W.P. and Kotanchek, M. (Eds), Genetic Programming Theory and Practice XII, Springer International Publishing, Cham, pp. 37-51, available at: https://doi.org/10.1007/978-3-319-16030-6_3

Labella, T.H., Dorigo, M. and Deneubourg, J.-L. (2004), “Efficiency and task allocation in prey retrieval”, in Ijspeert, A.J., Murata, M. and Wakamiya, N. (Eds), Biologically Inspired Approaches to Advanced Information Technology, Vol. 3141, Springer, Berlin and Heidelberg, pp. 274-289, available at: http://link.springer.com/10.1007/978-3-540-27835-1_21

Labella, T.H., Dorigo, M. and Deneubourg, J.-L. (2006), “Division of labor in a group of robots inspired by ants’ foraging behavior”, ACM Transactions on Autonomous and Adaptive Systems, Vol. 1 No. 1, pp. 4-25, available at: https://doi.org/10.1145/1152934.1152936

Le, M.N., Ong, Y.-S., Jin, Y. and Sendhoff, B. (2009), “Lamarckian memetic algorithms: local optimum and connectivity structure analysis”, Memetic Computing, Vol. 1 No. 3, pp. 175, available at: https://doi.org/10.1007/s12293-009-0016-9

Lehman, J. and Stanley, O.K. (2011), “Abandoning objectives: evolution through the search for novelty alone”, Evolutionary Computation, Vol. 19 No. 2, pp. 189-223.

Li, L., Martinoli, A. and Abu-Mostafa, Y.S. (2004), “Learning and measuring specialization in collaborative swarm systems”, Adaptive Behavior, Vol. 12 Nos 3/4, pp. 199-212, available at: https://doi.org/10.1177/105971230401200306

Lim, J.P. and Brunet, A. (2013), “Bridging the transgenerational gap with epigenetic memory”, Trends in Genetics: TIG, Vol. 29 No. 3, pp. 176-186, available at: https://doi.org/10.1016/j.tig.2012.12.008

Liu, W., Alan, F.T.W., Sa, J., Chen, J. and Dou, L. (2007), “Towards energy optimization: emergent task allocation in a swarm of foraging robots”, Adaptive Behavior, Vol. 15 No. 3, pp. 289-305, available at: https://doi.org/10.1177/1059712307082088

Lorenz, E.N. (1963), “Deterministic nonperiodic flow”, Journal of the Atmospheric Sciences, Vol. 20 No. 2, pp. 130-141, available at: https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2

Mahfoud, S. (1997), “Niching methods”, in Bäck, T., Fogel, D.B. and Michalewics, Z. (Eds), Handbook of Evolutionary Computation, Institute of Physics Publication and Oxford University Press, Bristol and Philadelphia, NY, pp. C6.1:1-C6.1:4.

Minsky, M. (1967), Computation: Finite and Infinite Machines, Prentice-Hall, Englewood Cliffs, NJ.

Mitchell, T.M. (1997), Machine Learning. McGraw-Hill Series in Computer Science, McGraw-Hill, New York, NY.

Mukhlish, F., Page, J. and Bain, M. (2018), Evolutionary-Learning Framework for Swarm Robotics Using Epigenetics Layer, Vol. 14, International Society of Intelligent Unmanned System, Jeju, p. 86.

Nash, John F. (1950), “Equilibrium points in N-person games”, Proceedings of the National Academy of Sciences, Vol. 36 No. 1, pp. 48-49, available at: https://doi.org/10.1073/pnas.36.1.48

Nolfi, S., Bongard, J., Husbands, P. and Floreano, D. (2016), “Evolutionary robotics”, in Siciliano, B. and Khatib, O. (Eds), Springer Handbook of Robotics, Springer International Publishing, Cham, pp. 2035-2068, available at: http://link.springer.com/10.1007/978-3-319-32552-1_76

Nouyan, S., Campo, A. and Dorigo, M. (2008), “Path formation in a robot swarm”, Swarm Intelligence, Vol. 2 No. 1, pp. 1-23, available at: https://doi.org/10.1007/s11721-007-0009-6

Page, J. and Mukhlish, F. (2017), Simulation the Only Way to Investigate Self-Organising Swarms, Simulation Australia, Sydney.

Pathak, J., Brian, Z.L., Hunt, R., Girvan, M. and Ott, E. (2017), “Using machine learning to replicate chaotic attractors and calculate lyapunov exponents from data”, Chaos: An Interdisciplinary Journal of Nonlinear Science, Vol. 27 No. 12, pp. 121102, available at: https://doi.org/10.1063/1.5010300

Periyasamy, S., Gray, A. and Kille, P. (2008), “The epigenetic algorithm”, In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence, IEEE, Hong Kong, pp. 3228-3236, available at: https://doi.org/10.1109/CEC.2008.4631235

Ranjbar-Sahraei, B., Weiss, G. and Nakisaee, A. (2012), “A multi-robot coverage approach based on stigmergic communication”, in Timm, I.J. and Guttmann, C. (Eds), Multiagent System Technologies, Lecture Notes in Computer Science, Springer, Berlin and Heidelberg, pp. 126-138, available at: https://doi.org/10.1007/978-3-642-33690-4_13

Reynolds, C.W. (1987), “Flocks, herds and schools: a distributed behavioral model”, ACM SIGGRAPH Computer Graphics, Vol. 21 No. 4, pp. 25-34, available at: https://doi.org/10.1145/37402.37406

Riedmiller, M., Gabel, T., Hafner, R. and Lange, S. (2009), “Reinforcement learning for robot soccer”, Autonomous Robots, Vol. 27 No. 1, pp. 55-73, available at: https://doi.org/10.1007/s10514-009-9120-4

Şahin, E. (2005), “Swarm robotics: from sources of inspiration to domains of application”, in Şahin, E. and Spears, W.M. (Eds), Swarm Robotics, Vol. 3342, Springer, Berlin and Heidelberg, pp. 10-20, available at: https://doi.org/10.1007/978-3-540-30552-1_2

Sareni, B. and Krahenbuhl, L. (1998), “Fitness sharing and niching methods revisited”, IEEE Transactions on Evolutionary Computation, Vol. 2 No. 3, pp. 97-106, available at: https://doi.org/10.1109/4235.735432

Sen, S. and Weiss, G. (1999), “Learning in multiagent systems”, in Weiss, G. (Ed.), Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, MIT Press, Cambridge, MA, pp. 259-298.

Sousa, J.A.B. and Costa, E. (2011), “Designing an epigenetic approach in artificial life: the EpiAL model”, in Filipe, J., Fred, A. and Sharp, B. (Eds), Agents and Artificial Intelligence, Vol. 129, Springer, Berlin and Heidelberg, pp. 78-90, available at: http://link.springer.com/10.1007/978-3-642-19890-8_6

Soysal, O. and Şahin, E. (2005), “Probabilistic aggregation strategies in swarm robotic systems”, Proceedings 2005 IEEE Swarm Intelligence Symposium, IEEE, Pasadena, CA, pp. 325-332, available at: https://doi.org/10.1109/SIS.2005.1501639

Spears, W.M., Diana, F.S., Hamann, J.C. and Heil, R. (2004), “Distributed, physics-based control of swarms of vehicles”, Autonomous Robots, Vol. 17 Nos 2/3, pp. 137-162, available at: https://doi.org/10.1023/B:AURO.0000033970.96785.f2

Stone, P., Richard, S.S. and Kuhlmann, G. (2005), “Reinforcement learning for Robocup Soccer keepaway”, Adaptive Behavior, Vol. 13 No. 3, pp. 165-188, available at: https://doi.org/10.1177/105971230501300301

Sutton, R.S. and Barto, A.G. (1998), Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA.

Tanev, I. and Yuta, K. (2008), “Epigenetic programming: genetic programming incorporating epigenetic learning through modification of histones”, Information Sciences, Vol. 178 No. 23, pp. 4469-4481, available at: https://doi.org/10.1016/j.ins.2008.07.027

Trianni, V., Groß, R., Labella, T.H., Şahin, E. and Dorigo, M. (2003), “Evolving aggregation behaviors in a swarm of robots”, in Banzhaf, W., Ziegler, J., Christaller, T., Dittrich, P. and Kim, J.T. (Eds), Advances in Artificial Life, Lecture Notes in Computer Science, Springer, Berlin and Heidelberg, pp. 865-874, available at: https://link.springer.com/chapter/10.1007/978-3-540-39432-7_93

Tuci, E., Ampatzis, C., Trianni, V., Christensen, A.L. and Dorigo, M. (2008), “Self-assembly in physical autonomous robots-the evolutionary robotics approach”, Proceedings of the Eleventh International Conference on the Simulation and Synthesis of Living Systems, Winchester, MIT Press, Cambridge, MA, pp. 616-623, available at: http://iridia0.ulb.ac.be/IridiaTrSeries/IridiaTr2008-007r001.pdf

Tuyls, K. and Weiss, G. (2012), “Multiagent learning: basics, challenges, and prospects”, AI Magazine, Vol. 33 No. 3, pp. 41, available at: https://doi.org/10.1609/aimag.v33i3.2426

Vidan, P., Hasanspahić, N. and Grbić, T. (2016), “Comparative analysis of renowned softwares for search and rescue operations”, NAŠE MORE: Znanstveno-Stručni Časopis Za More i Pomorstvo, Vol. 63 No. 2, pp. 73-80, available at: https://doi.org/10.17818/NM/2016/2.6

Waddington, C.H. (2012), “The epigenotype”, International Journal of Epidemiology, Vol. 41 No. 1, pp. 10-13, available at: https://doi.org/10.1093/ije/dyr184

Wang, Y., Liu, H. and Sun, Z. (2017), “Lamarck rises from his grave: parental environment-induced epigenetic inheritance in model organisms and humans: parental experience-induced epigenetic inheritance”, Biological Reviews, Vol. 92 No. 4, pp. 2084-2111, available at: https://doi.org/10.1111/brv.12322

Wright, A.J. (2015), “Clonal selection for the evolution of heterogeneous unmanned aerial vehicle swarms”, Bachelor of Engineering, School of Mechanical and Manufacturing, Faculty of Engineering, The University of New South Wales, Kensington and Sydney.

Yi, X., Zhu, A., Yang, S.X. and Luo, C. (2017), “A bio-inspired approach to task assignment of swarm robots in 3-D dynamic environments”, IEEE Transactions on Cybernetics, Vol. 47 No. 4, pp. 974-983, available at: https://doi.org/10.1109/TCYB.2016.2535153

Corresponding author

Faqihza Mukhlish can be contacted at: f.mukhlish@student.unsw.edu.au