The Small Loop Challenge

The Java applet should appear above this line. It was successfully tested with Firefox 22.0 and Chrome 28.0. If it does not appear then it probably has a compatibility problem with your browser. Please try another browser, or run it on your computer, or simply watch the demo video on our blog.

To run it on your computer: download NetLogo 4.1.3; download the model file: Ernest_V6-smallLoop3.nlogo; download the imos.jar and ernest.jar that constitute the IMOS extension. More information at the imos-netlogo forge.


This NetLogo model provides a platform to investigate the "Small Loop Problem".

The Small Loop Problem is a challenge that we submit to the community of artificial developmental cognition. This challenge consists of implementing an artificial agent that would "smartly" organize its behavior through autonomous interaction with the "Small Loop Environment".

The Small Loop Environment is the environment displayed in this model. This environment, together with a set of predefined "possibilities of interaction" afforded to the agent, form the "Small Loop Platform".

The Small Loop Platform offers six possibilities of action: try to move one square forward (succeed if the square ahead is empty), turn 90° left, turn 90° right, touch front square, touch left square, touch right square. Each interaction returns a single bit feedback to the agent that tells whether the agent interacted with a wall or an empty square (i.e., "step" vs "bump", "touch wall" vs "touch empty"; "turn" actions return a constant binary value). The agent has no other way of "perceiving" the environment than this single bit received when enacting these interactions.

The Small Loop Problem consists thus of implementing an agent capable of "smart" organization of behavior through the ten available interactions afforded by the Small Loop Platform (4 actions * 2 possible feedback values + 2 "turn" actions * 1 constant feedback).

By "smart" organization of behavior, we mean behavior consisting of autonomously discovering, learning, and exploiting regularities of interactions to satisfy the agent's preferences, with minimal initial presupposition of the environment being encoded in the agent.

Minimal initial presupposition implies, in particular, that the interpretation of interactions must not be hard-coded in the agent but rather learned through experience. For instance, the interaction labeled "touch front wall" could mistakenly be relabeled "turn left" in a new experiment, and still result in the same behavior.

The agent's self-motivation is defined by values associated with interactions. These values are fixed and defined by the experimenter. The Small Loop Problem thus consists of implementing a mechanism that tends to enact interactions with high values and to avoid interactions with negative values.

We defined the values of reference as follows: "move forward": 5; "bump wall": -10; "turn": -3; "touch empty square": -1, "touch wall": -2. With these values, the agent must try to maximize moving forward and avoid bumping and turning. The agent thus needs to learn to use "touch" to perceive its environment and only turn in appropriate categories of situations (because "touch" is "cheaper" than "bump" or "turn").

While we defined values of reference, the agent's algorithm must not presuppose them, but should rather adapt to any set of values. Trivial examples are cases where positive values are associated with turning or bumping or touching: the agent would learn to keep spinning in place, or bumping, or touching indefinitely.

While the Small Loop Problem looks simplistic, we argue that it raises important challenges regarding developmental cognition: specifically, the challenges of autonomously learning hierarchical sequences of interactions, and, simultaneously, learning spatial regularities of interaction. For instance, the agent should learn that, when a wall is touched on the left, then turning left and trying to move forward would result in bumping. Please see our publications for more details on why traditional artificial learning techniques (such as reinforcement learning or Bayesian sequence learning) do not address these requirements.


This NetLogo model demonstrates our current response to the Small Loop Problem. This agent learns that it should touch ahead before trying to move forward and not move forward if it touched a wall. Also, it learns that it should touch aside before turning and not turn in that direction if it touched a wall. This result is obtained by an original algorithm published in Georgeon and Ritter (2012).


* Click "Run" to run the agent.
* Click "Step" to run the agent one step at a time.
* Click "Re-initialize" to restart the agent from scratch (clear the agent's memory).
* Click "Reset values" to reset the values of interactions to default.

* Use the "Time interval" cursor to slow down the agent.
* Use the 5 "value cursors" to define the values of interactions: "step" (move one square forward), "bump" wall, "turn" left or right, "touch empty" square, "touch wall". Then click "Re-initialize" to restart the agent with the values you have defined (values take effect when the agent is restarted).

* Use the "Initial-position" to select an intial position, then click "Place Agent" and possibly "Rotate" to start the experiment in different initial conditions (The "Place Agent" button also re-initializes the agent).

When the agent touches a square, this square flashes yellow. When the agent bumps into a square, this square flashes red.


With the default values, notice that the agent learns to use "touch interactions" as active perception to ensure more move forward, less bumping, and less turns towards walls.

Notice that, when the touching bahavior has been learned, the agent always uses it. A smarter agent should learn to renounce touching when it is on the long edge of the loop because the agent should reckon that moving forward is safe in this situation.

Notice that this agent is deterministic. It will always behave the same when starting from the same initial conditions.

Start the experiment from position [1 1], orientation upwards. Notice that the agent reaches a stable behaviour aproximatly from tick 300.

Start the experiment from orientation [1 3], orientation upwards. Notice that the behavior gets organized approximatly from tick 600. Before that, the agent has troubles dealing with the upper-right corner of the loop.

An agent that fully answers the Small Loop Challenge should reliably demonstrate smarter behavior in the upper-right corner by reckonning the two-dimensional structure of the environment.


Try different values of interactions and observe that the agent organizes its behavior to obtain interactions with positive values. Note that the agent does not always find the sequence that lead to the highest positive values.

Try to click on the grid to add or remove walls while the agent is running and see how the agent deals with unexpected conditions.


This model can be extended either by completing the NetLogo code or by replacing the IMOS extension by your own extension.

We suspect that fully solving the Small Loop Problem would require making the agent capable of some form of rudimentary reflexivity. This is why we find this challenge interesting!


This model implements the IMOS extension (Intrinsic Motivation System,


This model was implemented by Olivier Georgeon (Université de Lyon / CNRS) and Ilias Sakellariou (University of Macedonia), using the IMOS NetLogo extension based on Georgeon and Ritter's paper "An Intrinsically Motivated Schema Mechanism to Model and Simulate Emergent Cognition" Cognitive Systems Research 15-16, pp. 73-92 (2012).


extensions [imos]

;;; Lets keep things organised

breed [ernests ernest]
breed [targets target]
breed [markers marker]
breed [trailmarks trailmark]

directed-link-breed [left-eyes left-eye]
directed-link-breed [right-eyes right-eye]

ernests-own [action status stimuli satisfaction]
patches-own [last-interaction old-color]
trailmarks-own [when-created]

globals [rcount ##int_targ_status ##target-color bumps diffbumps]

;;; called when the model is first initialised.
to startup

to setup
  set rcount 0
  create-markers 1 [set shape "grid" set color black set heading 0 set hidden? true]
  set ##target-color sky
  set bumps 0
  set diffbumps 0

to setup-patches
  ask patches [ set pcolor last-interaction ]

;;; Sets default values to the learning component of Ernest.
to default
  set step 5;-1
  set bump -10;-8
  set turn -3;0
  set touch-empty -1
  set touch-wall -2

;;; Simple procedure to set ernest and the target in 
;;; prespecified places to learn a new strategy. 
;;; Learns with default values for interactions. 
;to origin [ernx erny erno]
 ; default
;  ask ernests [setxy ernx erny set color 27]

to origin 
  ask ernests [setxy first initial-position item 1 initial-position set color 27]

to run-experiment
  wait time-interval
  ask patches with [pcolor = yellow or pcolor = 44 or pcolor = red]  [set pcolor old-color]
  ask trailmarks with [when-created + 10 = ticks] [die]  ;; this removes trailmarks. Change 10 to leave another trailmark.
  ask ernests [do]

to  plot-things
  if (ticks mod 10 = 0)
      set-current-plot "Bump Interactions Count"
      set-current-plot-pen "Number of Bumps"
      plot bumps
     set-current-plot-pen "Bumps over last 10 ticks"
     plot (bumps - diffbumps) 
     set diffbumps bumps

;; set up the environment (to look the same as the original ernest environment.)
to draw-environment
  ;;create-markers 1 [set shape "grid" set color grey set heading 0 set hidden? true]
  ;;foreach [(list pxcor pycor)] of patches [ask markers [move-to patch first ? item 1 ? stamp]]
  foreach [self] of patches [ask markers [move-to ? stamp] ask ? [set pcolor white] ]
  ;;ask markers [die]
  ;;create-markers 1 [set shape "default" set color grey set heading 0]
  ask patches with [pxcor = min-pxcor or pxcor = max-pxcor or pycor = max-pycor or pycor = min-pycor] [set pcolor green]
  ask patches with [pxcor = 2 and pycor = 2] [set pcolor green]
  ;ask patches with [pxcor = 1 and pycor = 1] [set pcolor green]
  ask patches with [pxcor = 2 and pycor = 3] [set pcolor green]
  ask patches with [pxcor = 3 and pycor = 2] [set pcolor green]
  ask patches with [pxcor = 4 and pycor = 4] [set pcolor green]
  ask patches [set old-color pcolor]

;;; Turns ernie by 90 degrees
to rotate-ernie
  ask ernests [rt 90]

;;; Main creation of ernest.
to create-ernest
 create-ernests 1 
 [set shape "default" 
  set heading 0 
  set color 27
  ;move-to clear-patch
  setxy 1 1
  ;; These are locals now in case we would like to include multiple agents.
  set stimuli "" 
  ;; The imos must be initialized in the context of each agent.
  ;imos:init 10 4
  imos:init 6 10
  imos:interaction "-" "   " touch-empty
  imos:interaction "-" "w  " touch-wall
  imos:interaction "\\" "   " touch-empty
  imos:interaction "\\" "w  " touch-wall
  imos:interaction "/" "   " touch-empty
  imos:interaction "/" "w  " touch-wall
  imos:interaction ">" "   " step
  imos:interaction ">" "w  " bump
  imos:interaction "v" "   " turn
  imos:interaction "v" "w  " turn
  imos:interaction "^" "   " turn
  imos:interaction "^" "w  " turn

;;; ernest actions
to-report move
  ifelse wall-ahead 
   [ask patch-ahead 1 [pflash red] pstamp "default" red set bumps bumps + 1 report "w"] 
   ;;[ask patch-ahead 1 [pflash red] report "w"] 
   [pstamp "default" gray move-to patch-ahead 1 report " "]

to-report touch 
  ifelse wall-ahead 
    [ask patch-ahead 1 [pflash 44] pstamp "dot" orange report "w"]
    [ask patch-ahead 1 [pflash yellow] pstamp "dot" yellow report " "] 

to-report touch-left
  let p patch-left-and-ahead 90 1
  ifelse [pcolor = green] of p 
    [ask p [ pflash 44] pstamp "dot" orange report "w"]
    [ask p [ pflash yellow] pstamp "dot" yellow report " "] 

to-report touch-right
  let p patch-right-and-ahead 90 1
  ifelse [pcolor = green] of p
    [ask p [ pflash 44] pstamp "dot" orange report "w"]
    [ask p [ pflash yellow] pstamp "dot" yellow report " "] 

to-report turn-left
  lt 90
  report " "

to-report turn-right
  rt 90
  report " "

;;; Is there a wall ahead?
to-report wall-ahead
  report [pcolor = green] of patch-ahead 1

;;; Should be organized with subroutines.
to do
  ;; the action
 set action imos:step  stimuli satisfaction
 if first action = ">"
  [set status move]
 if first action = "^"
  [set status turn-left ]
 if first action = "v"
  [set status turn-right ]
 if first action = "-"
  [set status touch ]
 if first action = "/"
  [set status touch-left ]
 if first action = "\\"
  [set status touch-right ]
 ;;;if any? targets-here [grab-target stop]
  ;; the stimuli
 ;let leftStimulus first [see] of out-left-eye-neighbors
 ;let rightStimulus first [see] of out-right-eye-neighbors
 ;set stimuli word leftStimulus  rightStimulus
 set stimuli "  "
 ;; the satisfaction
 if first action = ">" 
  [ifelse status = " "
    [set satisfaction step] 
    [set satisfaction bump]]
 if first action = "-" 
  [ifelse status = " "
    [set satisfaction touch-empty] 
    [set satisfaction touch-wall]]
 if first action = "/" 
  [ifelse status = " "
    [set satisfaction touch-empty] 
    [set satisfaction touch-wall]]
 if first action = "\\" 
  [ifelse status = " "
    [set satisfaction touch-empty] 
    [set satisfaction touch-wall]]
 if first action = "^" or first action = "v" 
  [set  satisfaction turn]
 set stimuli word status stimuli

 ;; the trace 
 ;; output-write first action
 ;; setup-patches
 output-print word first action word " " word stimuli word " " satisfaction

;;; Utility to make a patch flash for a fragment of a second.
to pflash [c]
  set old-color pcolor
  set pcolor c

to pstamp [s c]
   ;hatch-trailmarks 1 [set shape s set size 0.5 set color c set when-created ticks]

;;; Switch wall with mouse
to place-remove-wall
  if mouse-down? 
  [every 0.2 [switch-wall]]
to switch-wall
  let p patch round mouse-xcor round mouse-ycor
   ifelse [pcolor = green] of p
    [ask p [set pcolor white]]
    [ask p [set pcolor green]]