The Java applet should appear above this line. It was successfully tested with Firefox 22.0 and Chrome 28.0. If it does not appear then it probably has a compatibility problem with your browser. Please try another browser, or run it on your computer, or simply watch the demo video on our blog.
To run it on your computer: download NetLogo 4.1.3; download the model file: Ernest_V6-smallLoop3.nlogo; download the imos.jar and ernest.jar that constitute the IMOS extension. More information at the imos-netlogo forge.
This NetLogo model provides a platform to investigate the "Small Loop Problem".
The Small Loop Problem is a challenge that we submit to the community of artificial developmental cognition. This challenge consists of implementing an artificial agent that would "smartly" organize its behavior through autonomous interaction with the "Small Loop Environment".
The Small Loop Environment is the environment displayed in this model. This environment, together with a set of predefined "possibilities of interaction" afforded to the agent, form the "Small Loop Platform".
The Small Loop Platform offers six possibilities of action: try to move one square forward (succeed if the square ahead is empty), turn 90° left, turn 90° right, touch front square, touch left square, touch right square. Each interaction returns a single bit feedback to the agent that tells whether the agent interacted with a wall or an empty square (i.e., "step" vs "bump", "touch wall" vs "touch empty"; "turn" actions return a constant binary value). The agent has no other way of "perceiving" the environment than this single bit received when enacting these interactions.
The Small Loop Problem consists thus of implementing an agent capable of "smart" organization of behavior through the ten available interactions afforded by the Small Loop Platform (4 actions * 2 possible feedback values + 2 "turn" actions * 1 constant feedback).
By "smart" organization of behavior, we mean behavior consisting of autonomously discovering, learning, and exploiting regularities of interactions to satisfy the agent's preferences, with minimal initial presupposition of the environment being encoded in the agent.
Minimal initial presupposition implies, in particular, that the interpretation of interactions must not be hard-coded in the agent but rather learned through experience. For instance, the interaction labeled "touch front wall" could mistakenly be relabeled "turn left" in a new experiment, and still result in the same behavior.
The agent's self-motivation is defined by values associated with interactions. These values are fixed and defined by the experimenter. The Small Loop Problem thus consists of implementing a mechanism that tends to enact interactions with high values and to avoid interactions with negative values.
We defined the values of reference as follows: "move forward": 5; "bump wall": -10; "turn": -3; "touch empty square": -1, "touch wall": -2. With these values, the agent must try to maximize moving forward and avoid bumping and turning. The agent thus needs to learn to use "touch" to perceive its environment and only turn in appropriate categories of situations (because "touch" is "cheaper" than "bump" or "turn").
While we defined values of reference, the agent's algorithm must not presuppose them, but should rather adapt to any set of values. Trivial examples are cases where positive values are associated with turning or bumping or touching: the agent would learn to keep spinning in place, or bumping, or touching indefinitely.
While the Small Loop Problem looks simplistic, we argue that it raises important challenges regarding developmental cognition: specifically, the challenges of autonomously learning hierarchical sequences of interactions, and, simultaneously, learning spatial regularities of interaction. For instance, the agent should learn that, when a wall is touched on the left, then turning left and trying to move forward would result in bumping. Please see our publications for more details on why traditional artificial learning techniques (such as reinforcement learning or Bayesian sequence learning) do not address these requirements.
This NetLogo model demonstrates our current response to the Small Loop Problem. This agent learns that it should touch ahead before trying to move forward and not move forward if it touched a wall. Also, it learns that it should touch aside before turning and not turn in that direction if it touched a wall. This result is obtained by an original algorithm published in Georgeon and Ritter (2012).
* Click "Run" to run the agent.
* Click "Step" to run the agent one step at a time.
* Click "Re-initialize" to restart the agent from scratch (clear the agent's memory).
* Click "Reset values" to reset the values of interactions to default.
* Use the "Time interval" cursor to slow down the agent.
* Use the 5 "value cursors" to define the values of interactions: "step" (move one square forward), "bump" wall, "turn" left or right, "touch empty" square, "touch wall". Then click "Re-initialize" to restart the agent with the values you have defined (values take effect when the agent is restarted).
* Use the "Initial-position" to select an intial position, then click "Place Agent" and possibly "Rotate" to start the experiment in different initial conditions (The "Place Agent" button also re-initializes the agent).
When the agent touches a square, this square flashes yellow. When the agent bumps into a square, this square flashes red.
With the default values, notice that the agent learns to use "touch interactions" as active perception to ensure more move forward, less bumping, and less turns towards walls.
Notice that, when the touching bahavior has been learned, the agent always uses it. A smarter agent should learn to renounce touching when it is on the long edge of the loop because the agent should reckon that moving forward is safe in this situation.
Notice that this agent is deterministic. It will always behave the same when starting from the same initial conditions.
Start the experiment from position [1 1], orientation upwards. Notice that the agent reaches a stable behaviour aproximatly from tick 300.
Start the experiment from orientation [1 3], orientation upwards. Notice that the behavior gets organized approximatly from tick 600. Before that, the agent has troubles dealing with the upper-right corner of the loop.
An agent that fully answers the Small Loop Challenge should reliably demonstrate smarter behavior in the upper-right corner by reckonning the two-dimensional structure of the environment.
Try different values of interactions and observe that the agent organizes its behavior to obtain interactions with positive values. Note that the agent does not always find the sequence that lead to the highest positive values.
Try to click on the grid to add or remove walls while the agent is running and see how the agent deals with unexpected conditions.
This model can be extended either by completing the NetLogo code or by replacing the IMOS extension by your own extension.
We suspect that fully solving the Small Loop Problem would require making the agent capable of some form of rudimentary reflexivity. This is why we find this challenge interesting!
This model implements the IMOS extension (Intrinsic Motivation System, http://code.google.com/p/imos-netlogo/).
This model was implemented by Olivier Georgeon (Université de Lyon / CNRS) and Ilias Sakellariou (University of Macedonia), using the IMOS NetLogo extension based on Georgeon and Ritter's paper "An Intrinsically Motivated Schema Mechanism to Model and Simulate Emergent Cognition" Cognitive Systems Research 15-16, pp. 73-92 (2012).
extensions [imos]
;;; Lets keep things organised
breed [ernests ernest]
breed [targets target]
breed [markers marker]
breed [trailmarks trailmark]
directed-link-breed [left-eyes left-eye]
directed-link-breed [right-eyes right-eye]
ernests-own [action status stimuli satisfaction]
patches-own [last-interaction old-color]
trailmarks-own [when-created]
globals [rcount ##int_targ_status ##target-color bumps diffbumps]
;;; called when the model is first initialised.
to startup
setup
end
to setup
ca
clear-output
reset-ticks
set rcount 0
create-markers 1 [set shape "grid" set color black set heading 0 set hidden? true]
draw-environment
set ##target-color sky
set bumps 0
set diffbumps 0
create-ernest
end
to setup-patches
ask patches [ set pcolor last-interaction ]
end
;;; Sets default values to the learning component of Ernest.
to default
set step 5;-1
set bump -10;-8
set turn -3;0
set touch-empty -1
set touch-wall -2
setup
end
;;; Simple procedure to set ernest and the target in
;;; prespecified places to learn a new strategy.
;;; Learns with default values for interactions.
;to origin [ernx erny erno]
; default
; ask ernests [setxy ernx erny set color 27]
;end
to origin
default
ask ernests [setxy first initial-position item 1 initial-position set color 27]
end
to run-experiment
place-remove-wall
wait time-interval
ask patches with [pcolor = yellow or pcolor = 44 or pcolor = red] [set pcolor old-color]
ask trailmarks with [when-created + 10 = ticks] [die] ;; this removes trailmarks. Change 10 to leave another trailmark.
ask ernests [do]
plot-things
tick
end
to plot-things
if (ticks mod 10 = 0)
[
set-current-plot "Bump Interactions Count"
set-current-plot-pen "Number of Bumps"
plot bumps
set-current-plot-pen "Bumps over last 10 ticks"
plot (bumps - diffbumps)
set diffbumps bumps
]
end
;; set up the environment (to look the same as the original ernest environment.)
to draw-environment
;;create-markers 1 [set shape "grid" set color grey set heading 0 set hidden? true]
;;foreach [(list pxcor pycor)] of patches [ask markers [move-to patch first ? item 1 ? stamp]]
foreach [self] of patches [ask markers [move-to ? stamp] ask ? [set pcolor white] ]
;;ask markers [die]
;;create-markers 1 [set shape "default" set color grey set heading 0]
ask patches with [pxcor = min-pxcor or pxcor = max-pxcor or pycor = max-pycor or pycor = min-pycor] [set pcolor green]
ask patches with [pxcor = 2 and pycor = 2] [set pcolor green]
;ask patches with [pxcor = 1 and pycor = 1] [set pcolor green]
ask patches with [pxcor = 2 and pycor = 3] [set pcolor green]
ask patches with [pxcor = 3 and pycor = 2] [set pcolor green]
ask patches with [pxcor = 4 and pycor = 4] [set pcolor green]
ask patches [set old-color pcolor]
end
;;; Turns ernie by 90 degrees
to rotate-ernie
ask ernests [rt 90]
end
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Main creation of ernest.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
to create-ernest
create-ernests 1
[set shape "default"
set heading 0
set color 27
;;erny-has-eyes
;move-to clear-patch
setxy 1 1
;; These are locals now in case we would like to include multiple agents.
set stimuli ""
;; The imos must be initialized in the context of each agent.
;imos:init 10 4
imos:init 6 10
imos:interaction "-" " " touch-empty
imos:interaction "-" "w " touch-wall
imos:interaction "\\" " " touch-empty
imos:interaction "\\" "w " touch-wall
imos:interaction "/" " " touch-empty
imos:interaction "/" "w " touch-wall
imos:interaction ">" " " step
imos:interaction ">" "w " bump
imos:interaction "v" " " turn
imos:interaction "v" "w " turn
imos:interaction "^" " " turn
imos:interaction "^" "w " turn
]
end
;;; ernest actions
to-report move
ifelse wall-ahead
[ask patch-ahead 1 [pflash red] pstamp "default" red set bumps bumps + 1 report "w"]
;;[ask patch-ahead 1 [pflash red] report "w"]
[pstamp "default" gray move-to patch-ahead 1 report " "]
end
to-report touch
ifelse wall-ahead
[ask patch-ahead 1 [pflash 44] pstamp "dot" orange report "w"]
[ask patch-ahead 1 [pflash yellow] pstamp "dot" yellow report " "]
end
to-report touch-left
let p patch-left-and-ahead 90 1
ifelse [pcolor = green] of p
[ask p [ pflash 44] pstamp "dot" orange report "w"]
[ask p [ pflash yellow] pstamp "dot" yellow report " "]
end
to-report touch-right
let p patch-right-and-ahead 90 1
ifelse [pcolor = green] of p
[ask p [ pflash 44] pstamp "dot" orange report "w"]
[ask p [ pflash yellow] pstamp "dot" yellow report " "]
end
to-report turn-left
lt 90
report " "
end
to-report turn-right
rt 90
report " "
end
;;; Is there a wall ahead?
to-report wall-ahead
report [pcolor = green] of patch-ahead 1
end
;;; Should be organized with subroutines.
to do
;; the action
set action imos:step stimuli satisfaction
if first action = ">"
[set status move]
if first action = "^"
[set status turn-left ]
if first action = "v"
[set status turn-right ]
if first action = "-"
[set status touch ]
if first action = "/"
[set status touch-left ]
if first action = "\\"
[set status touch-right ]
;;;if any? targets-here [grab-target stop]
;; the stimuli
;let leftStimulus first [see] of out-left-eye-neighbors
;let rightStimulus first [see] of out-right-eye-neighbors
;set stimuli word leftStimulus rightStimulus
set stimuli " "
;; the satisfaction
if first action = ">"
[ifelse status = " "
[set satisfaction step]
[set satisfaction bump]]
if first action = "-"
[ifelse status = " "
[set satisfaction touch-empty]
[set satisfaction touch-wall]]
if first action = "/"
[ifelse status = " "
[set satisfaction touch-empty]
[set satisfaction touch-wall]]
if first action = "\\"
[ifelse status = " "
[set satisfaction touch-empty]
[set satisfaction touch-wall]]
if first action = "^" or first action = "v"
[set satisfaction turn]
set stimuli word status stimuli
;; the trace
;; output-write first action
;; setup-patches
output-print word first action word " " word stimuli word " " satisfaction
end
;;; Utility to make a patch flash for a fragment of a second.
to pflash [c]
set old-color pcolor
set pcolor c
end
to pstamp [s c]
;hatch-trailmarks 1 [set shape s set size 0.5 set color c set when-created ticks]
end
;;; Switch wall with mouse
to place-remove-wall
if mouse-down?
[every 0.2 [switch-wall]]
end
to switch-wall
let p patch round mouse-xcor round mouse-ycor
ifelse [pcolor = green] of p
[ask p [set pcolor white]]
[ask p [set pcolor green]]
end