The Small Loop Challenge

Powered by NetLogo. View/download model file: Ernest_V6-smallLoop2.nlogo, and download imos.jar and ernest.jar that constitute the IMOS extension.

WHAT IS IT?

This NetLogo application provides a platform to investigate the "Small Loop Problem".

The Small Loop Problem is a challenge that we submit to the community of artificial developmental cognition. This challenge consists of implementing an artificial agent that would "smartly" organize its behavior through autonomous interaction with the "Small Loop Environment".

The Small Loop Environment is the environment displayed in this model. This environment, together with a set of predefined "possibilities of interaction" afforded to the agent, form the "Small Loop Platform".

The Small Loop Platform offers six possibilities of action: try to move one square forward (succeed if the square ahead is empty), turn 90° left, turn 90° right, touch square ahead, touch left square, touch right square. Each interaction returns a single bit feedback to the agent that tells weather the agent interacted with a wall or an empty square (in case of "touch" and "move forward", whereas "turn" actions return always the same binary value). The agent has no other way of "perceiving" the environment than this single bit received when enacting these interactions.

The Small Loop Problem consists thus of implementing an agent capable of "smart" organization of behavior through the ten available interactions afforded by the Small Loop Platform (4 actions * 2 possible feedback values + 2 "turn" actions * 1 possible feedback).

By "smart" organization of behavior, we mean behavior consisting of autonomously discovering, learning, and exploiting regularities of interactions to satisfy the agent's preferences, with minimal initial presupposition of the environment being encoded in the agent.

Minimal initial presupposition implies, in particular, that the interpretation of interactions must not be hard-coded in the agent but rather learned through experience. For instance, the interaction labeled "touch wall ahead" could mistakenly be relabeled "turn left" in a new experiment, and still result in the same behavior.

The agent's self-motivation is defined by values associated with interactions. These values are fixed and defined by the modeler. The Small Loop Problem thus consists of implementing a mechanism that tends to enact interactions with high values and to avoid interactions with negative values.

Typical values are: "move forward": 5; "bump wall": -10; "turn": -3; "touch empty square": -1, "touch wall": -2. With these values, the agent must try to maximize moving forward and avoid bumping and turning. The agent thus needs to learn to use "touch" to perceive its environment and only turn in appropriate categories of situations (because "touch" is "cheaper" than "bump" or "turn").

The agent's algorithm must not presuppose the values associated with interactions. Examples of trivial cases are cases where positive values are associated with turning or bumping or touching: the agent would learn to keep spinning in place, or bumping, or touching indefinitely.

While the Small Loop Problem looks simplistic, we argue that it raises important challenges regarding developmental cognition: specifically, the challenges of autonomously learning hierarchical sequences of interactions, and, simultaneously, learning spatial regularities of interaction. For instance, the agent should learn that, when a wall is touched on the left, then turning left and trying to move forward would result in bumping. Please see our publications for more details on why traditional artificial learning techniques (such as reinforcement learning or Bayesian sequence learning) do not address these requirements.

HOW IT WORKS

This NetLogo application demonstrates our current response to the Small Loop Problem. This agent learns that it should touch ahead before trying to move forward and not move forward if it touched a wall. Also, it learns that it should touch aside before turning and not turn in that direction if it touched a wall. This result is obtained by an original algorithm that we have called the intrinsically-motivated schema mechanism.

HOW TO USE IT

* Click "Run" to run the agent.
* Click "Step" to run the agent one step at a time.
* Click "Re-initialize" to restart the agent from scratch (clear the agent's memory).
* Click "Reset values" to reset the values of interactions to default.

* Use the "Time interval" cursor to slow down the agent.
* Use the 5 "value cursors" to define the values of interactions: "step" (move one square forward), "bump" wall, "turn" left or right, "touch empty" square, "touch wall". Then click "Re-initialize" to restart the agent with the values you have defined (values take effect when the agent is restarted).

* Use the "Origin x y" and "Rotate" buttons to start the experiment in different initial conditions (these buttons also re-initialize the agent).

When the agent touches a square, this square flashes yellow. When the agent bumps into a square, this square flashes red.

THINGS TO NOTICE

With the default values, notice that the agent learns to use "touch interactions" as active perception to ensure more move forward, less bumping, and less turns towards walls.

Notice that this agent is deterministic. It will always behave the same when starting from the same initial conditions.

This experiment shows that the agent experiences confusion when it reaches the "inverted corner" of the loop. An agent that fully answers the Small Loop Challenge should be able to identify additional "spatial regularities" that would help the agent reckon the two-dimensional structure of the environment. Such an agent should also learn to renounce touching when it is on a long edge of the loop because moving forward is safe.

THINGS TO TRY

Try different values of interactions and observe that the agent organizes its behavior to obtain interactions with positive values. Note that the agent does not always find the sequence that lead to the highest positive values.

EXTENDING THE MODEL

This model can be extended either by completing the NetLogo code or by replacing the IMOS extension by your own extension.

We believe that fully solving the Small Loop Problem requires coupling a hierarchical sequential learning mechanism with a spatial learning mechanism while still keeping the principles of self-motivation and minimal initial preconception. This is yet an unsolved challenge.

NETLOGO FEATURES

This model implements the IMOS extension (Intrinsic Motivation System, http://code.google.com/p/imos-netlogo/).

RELATED MODELS

There is no related NetLogo model that we would be aware of.

CREDITS AND REFERENCES

This model was implemented by Olivier Georgeon (Université de Lyon / CNRS) and Ilias Sakellariou (University of Macedonia), using the IMOS NetLogo extension based on Georgeon and Ritter's paper "An Intrinsically Motivated Schema Mechanism to Model and Simulate Emergent Cognition" Cognitive Systems Research 15-16, pp. 73-92 (2012).

PROCEDURES

;;;;; Uncleaned version 

extensions [imos]

;;; Lets keep things organised
breed [eyes eye]
breed [ernests ernest]
breed [targets target]
breed [markers marker]


eyes-own [previously-seen distance-of-target] ;; Varaiables needed to encode change

directed-link-breed [left-eyes left-eye]
directed-link-breed [right-eyes right-eye]

ernests-own [action status stimuli satisfaction]

globals [rcount ##int_targ_status ##target-color]

patches-own [last-interaction]

;;; called when the model is first initialised.
to startup
  random-board
  ;place-target-random
end

to setup
  ca 
  clear-output
  reset-ticks 
  set rcount 0
  create-markers 1 [set shape "grid" set color black set heading 0 set hidden? true]
  draw-environment
  set ##target-color sky
  create-ernest
end

to setup-patches
  ask patches [ set pcolor last-interaction ]
end

;;; Create a random board and initialise Ernest.
to random-board
  setup
  ;place-target-random
end

;;; Sets default values to the learning component of Ernest.
to default
  set step 5;-1
  set bump -10;-8
  set turn -3;0
  set touch-empty -1
  set touch-wall -2
  setup
end

;;; Simple procedure to set ernest and the target in 
;;; prespecified places to learn a new strategy. 
;;; Learns with default values for interactions. 
to learn-strategy [ernx erny tx ty]
  default
  place-target-random
  ask ernests [setxy ernx erny]
  ask targets [setxy tx ty]
end

;;; Simple procedure to set ernest and the target in 
;;; prespecified places to learn a new strategy. 
;;; Learns with default values for interactions. 
to origin [ernx erny erno]
  default
  ask ernests [setxy ernx erny set color 27]
end


;;; Error Message reported when timeout expires.
to print-timeout-message
   output-print "*** Timeout!! ***"
   wait 0.5
end 


to run-experiment
  place-remove-targets
  wait time-interval
  ask ernests [do]
  tick
end

;; set up the environment (to look the same as the original ernest environment.)
to draw-environment
  ;;create-markers 1 [set shape "grid" set color grey set heading 0 set hidden? true]
  ;;foreach [(list pxcor pycor)] of patches [ask markers [move-to patch first ? item 1 ? stamp]]
  foreach [self] of patches [ask markers [move-to ? stamp] ask ? [set pcolor white]]
  ;;ask markers [die]
  ;;create-markers 1 [set shape "default" set color grey set heading 0]
  ask patches with [pxcor = min-pxcor or pxcor = max-pxcor or pycor = max-pycor or pycor = min-pycor] [set pcolor green]
  ask patches with [pxcor = 2 and pycor = 2] [set pcolor green]
  ;ask patches with [pxcor = 1 and pycor = 1] [set pcolor green]
  ask patches with [pxcor = 2 and pycor = 3] [set pcolor green]
  ask patches with [pxcor = 3 and pycor = 2] [set pcolor green]
  ask patches with [pxcor = 4 and pycor = 4] [set pcolor green]
end

;;; Custom Placement 

to custom-placement
user-message "Once you place your mouse in Ernest's environent you can set the position of enrest by drag and drop and place/remove targets by clicking on them. Move the mouse out of the area to finish."
while [not mouse-inside?] [wait 0.05]
  loop [
  ifelse any? ernests-on patch mouse-xcor round mouse-ycor 
     [move-ernie]
     [place-remove-targets]
   if not mouse-inside? [stop]  
  ]  
end

to rotate-ernie
  ask ernests [rt 90]
end

to move-ernie
  while [mouse-down?] [ask ernests [setxy round mouse-xcor round mouse-ycor]]
end 

to place-remove-targets
  if mouse-down? 
  [every 0.2 [switch-status-wall]]
end  
  
to switch-status-targets
   ifelse any? targets-on patch round mouse-xcor round mouse-ycor
    [ask targets-on patch round mouse-xcor round mouse-ycor [die]]
    [create-targets 1 [set shape "ernest-target" set color ##target-color setxy round mouse-xcor round mouse-ycor]]
end

to switch-status-wall
  let p patch round mouse-xcor round mouse-ycor
   ifelse [pcolor = green] of p
    [ask p [set pcolor black]]
    [ask p [set pcolor green]]
end
;;; Just placing target with a mouse click. Carefully, since 
;;; without checks it produces a large number of overlapping patches
;;; target is placed in the center of the patch. 


;;; Clearing a target
;to clear-target
; if mouse-down? and any? targets-on patch round mouse-xcor round mouse-ycor
;    [ask targets-on patch round mouse-xcor round mouse-ycor [die]]
;end

;;; clear-all-targets
to clear-all-targets
  ask targets [die]
end

;;; {Place a random target
to place-target-random
  let p clear-patch
  ifelse p = nobody 
  [user-message "Cannot place any more targets. Grid is Full!"]
  [create-targets 1 [set shape "ernest-target" set color ##target-color move-to p]]
  
end

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Main creation of ernest.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
to create-ernest
 create-ernests 1 
 [set shape "default" 
  set heading 0 
  set color 27
  ;;erny-has-eyes
  ;move-to clear-patch
  setxy 1 1
  ;; These are locals now in case we would like to include multiple agents.
  set stimuli "" 
  ;; The imos must be initialized in the context of each agent.
  ;imos:init 10 4
  imos:init 6 10
  imos:interaction "-" "   " touch-empty
  imos:interaction "-" "w  " touch-wall
  imos:interaction "\\" "   " touch-empty
  imos:interaction "\\" "w  " touch-wall
  imos:interaction "/" "   " touch-empty
  imos:interaction "/" "w  " touch-wall
  imos:interaction ">" "   " step
  imos:interaction ">" "w  " bump
  imos:interaction "v" "   " turn
  imos:interaction "v" "w  " turn
  imos:interaction "^" "   " turn
  imos:interaction "^" "w  " turn
  ]
end 

;;; Now we create the eyes of ernest and tie them to the body.
to erny-has-eyes 
  hatch-eyes 1 [setup-eyes -45 create-left-eye-from myself [tie]]
  hatch-eyes 1 [setup-eyes 45 create-right-eye-from myself [tie]]
end 

; setup the "eyes" of Ernie...
to setup-eyes [angle]
  set shape "ernest-eye" 
  set heading angle 
  set color white 
  set previously-seen false ;; nothing seen
  set distance-of-target 1000 ;; nothing seen 
end


;; Symbols Actuators Sensors Description Intrinsic  satisfaction
;;  ^    (^) Turn left True Turn 90¬∞ left toward adjacent empty square 0    (indifferent)
;;       [^] False Turn 90¬∞ left toward adjacent wall -5   (dislike)
;;  >    (>) Forward True Move forward 0     (indifferent)
;;       [>] False Bump wall -8   (dislike)
;;  v    (v) Turn right True Turn 90¬∞ right toward adjacent empty square 0     (indifferent)
;;       [v] False Turn 90¬∞ right toward adjacent wall -5    (dislike)
;;       * Appear Target appears in distal sensor field 15   (love)
;;       + Closer Target approaches in distal sensor field 10   (enjoy)
;;       x Reached Target reached according to distal sensor 15   (love)
;;       o  Disappear Target disappears from distal sensor field -15  (hate)

;;; ernest actions
to-report move
  ifelse wall-ahead 
   [ask patch-ahead 1 [pflash red] pstamp "default" red report "w"] 
   [pstamp "default" gray move-to patch-ahead 1 report " "]
end 

to-report touch 
  ifelse wall-ahead 
    [ask patch-ahead 1 [pflash 44] pstamp "dot" orange report "w"]
    [ask patch-ahead 1 [pflash yellow] pstamp "dot" yellow report " "] 
end 

to-report touch-left
  let p patch-left-and-ahead 90 1
  ifelse [pcolor = green] of p 
    [ask p [ pflash 44] pstamp "dot" orange report "w"]
    [ask p [ pflash yellow] pstamp "dot" yellow report " "] 
end 

to-report touch-right
  let p patch-right-and-ahead 90 1
  ifelse [pcolor = green] of p
    [ask p [ pflash 44] pstamp "dot" orange report "w"]
    [ask p [ pflash yellow] pstamp "dot" yellow report " "] 
end 

to-report turn-left
  lt 90
  report " "
  ;ifelse wall-ahead 
  ;  [report "w"]
  ;  [report " "]
end

to-report turn-right
  rt 90
  report " "
  ;ifelse wall-ahead 
  ;  [report "w"]
  ;  [report " "]
end


;;; Is there a wall ahead?
to-report wall-ahead
  report [pcolor = green] of patch-ahead 1 
end

;;; Should be organized with subroutines.
to do
  ;; the action
 set action imos:step  stimuli satisfaction
 if first action = ">"
  [set status move]
 if first action = "^"
  [set status turn-left ]
 if first action = "v"
  [set status turn-right ]
 if first action = "-"
  [set status touch ]
 if first action = "/"
  [set status touch-left ]
 if first action = "\\"
  [set status touch-right ]
 if any? targets-here [grab-target stop]
  
  ;; the stimuli
 ;let leftStimulus first [see] of out-left-eye-neighbors
 ;let rightStimulus first [see] of out-right-eye-neighbors
 ;set stimuli word leftStimulus  rightStimulus
 set stimuli "  "
  
 ;; the satisfaction
 if first action = ">" 
  [ifelse status = " "
    [set satisfaction step] 
    [set satisfaction bump]]
 if first action = "-" 
  [ifelse status = " "
    [set satisfaction touch-empty] 
    [set satisfaction touch-wall]]
 if first action = "/" 
  [ifelse status = " "
    [set satisfaction touch-empty] 
    [set satisfaction touch-wall]]
 if first action = "\\" 
  [ifelse status = " "
    [set satisfaction touch-empty] 
    [set satisfaction touch-wall]]
 if first action = "^" or first action = "v" 
  [set  satisfaction turn]
  
 ;if leftStimulus = "x" [set satisfaction satisfaction + appear]
 ;if leftStimulus = "+" [set satisfaction satisfaction + closer]
 ;if leftStimulus = "*" [set satisfaction satisfaction + appear]
 ;if leftStimulus = "o" [set satisfaction satisfaction + disappear]
 ;if rightStimulus = "x" [set satisfaction satisfaction + appear]
 ;if rightStimulus = "+" [set satisfaction satisfaction + closer]
 ;if rightStimulus = "*" [set satisfaction satisfaction + appear]
 ;if rightStimulus = "o" [set satisfaction satisfaction + disappear]
 ;if stimuli = "oo" [set satisfaction -10] ;;; this helps
 
 set stimuli word status stimuli

 ;; the trace 
 ;; output-write first action
 ;;setup-patches
 output-print word first action word " " word stimuli word " " satisfaction
end 

to grab-target 
  ask targets-here [die]
end


;;; Utilities 
to-report clear-patch 
  report one-of patches with [pcolor != green and not any? turtles-here ]
end

;;; Utility to make a patch flash for a fragment of a second.
to pflash [c]
  let old-color pcolor
  set pcolor c
  wait 0.2
  set pcolor old-color
end

to pstamp [s c]
  let place patch-here
  let head heading
  ;ask markers [set shape s set size 0.5 set heading head set color c move-to place stamp]
end