Transcript
Exploring Robotic Minds
OXFORD SERIES ON COGNITIVE MODELS AND ARCHITECTURES
Series Editor Frank E. Ritter
Series Board Rich Carlson Gary Cottrell Robert L. Goldstone Eva Hudlicka William G. Kennedy Pat Langley Robert St. Amant
Integrated Models of Cognitive Systems Edited by Wayne D. Gray
In Order to Learn: How the Se quence of Topics Influences Learning Edited by Frank E. Ritter, Joseph Nerb, Erno Lehtinen, and Timothy O’Shea
How Can the Human Mind Occur in the Physical Universe? By John R. Anderson
Principles of Synthetic Intelligence PSI: An Architecture of Motivated Cognition By Joscha Bach
The Multitasking Mind By David D. Salvucci and Nie ls A. Taatgen
How to Build a Brain: A Neural Architecture for Biological Cognition By Chris Eliasmith
Minding Norms: Mechanisms and Dynamics of Social Order in Agent Societies Edited by Rosaria Conte, Giulia Andrighetto, and Marco Campennì
Social Emotions in Nature and Artifact Edited by Jonathan Gratch and Stacy Marsella
Anatomy of the Mind: Exploring Psychological Mechanisms and Processes with the Clarion Cognitive Architecture By Ron Sun
Exploring Robotic Minds: Actions, Symbols, and Consciousness as Self-Organizing Dynamic Phenomena By Jun Tani
Exploring Robotic Minds
Actions, Symbols, and Consciousness as Self-Organizing Dynamic Phenomena
Jun Tani
1
1 Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarsh ip, and education by publishing worldwide. Oxford is a registered t rade mark of Oxford University Press in the UK and certain other countries. Published in the United States of America by Oxford University Press 198 Madison Avenue, New Y ork, NY 10016, United States of Amer ica. © Oxford Universit y Press 2017 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by license, or under terms ag reed with the appropriate re production rights organization. Inquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above. You must not circulate th is work in any other form and you must impose thi s same condition on any acquirer. Library of Congress Cataloging- in-Publication Data Names: Tani, Jun, 1958 – author. Title: Exploring robotic minds : actions, symbols, and consciousness as self-organizing dynamic phenomena / Ju n Tani. Description: O xford; New York: Oxford University ss, [2017] | Series: Cognitive models and architectures | IncludesPre bibliographical references and index. Identifiers: LCCN 2016014889 (pri nt) | LCCN 2016023997 (ebook) | ISBN 9780190281069 (hardcover : a lk. paper) | ISBN 9780190281076 (UPDF ) Subjects: LCSH: Artificial intelligence. | Robotics. | Cognitive neuroscience. Classification: LCC Q335 .T3645 2017 (print) | LCC Q335 (ebook) | DDC 629.8/9263—dc23 LC re cord available at htt ps://lccn.loc.gov /2016014889 9 8 7 6 5 4 3 2 1 Printed by Sheridan Books, I nc., United States of America
Contents
Foreword by Frank E. R itter
ix
Preface
xiii
Part I On the Mind
1. Where Do We Begin with Mind?
3
2. Cognitivism
9
2.1 2.2 2.3 2.4 2.5
Composition and Recursion in Symbol Systems Some Cognitive Models The Symbol Grounding Problem Context Summary
3. Phenomenology
9 13 16 18 19 21
3.1 Direct Experience 3.2 The Subjective Mind and Objective World 3.3 Time Perception: How Can the Flow of Subjective Experiences Be Objectified? 3.4 Being-in-the-World 3.5 Embodiment of Mind 3.6 Stream of Consciousness and Free Will 3.7 Summary
v
22 23 26 29 32 37 41
vi
Contents
4. Introducing the Brain and Brain Science 4.1 Hierarchical Brain Mechanisms for Visual Recognition and Action Generation 4.2 A New Understanding of Action Generation and Recognition in the Brain 4.3 How Can Intention Arise Spontaneously and Become an Object of Conscious Awareness? 4.4 Deciding Among Conflicting Evidence 4.5 Summary 5. Dynamical Systems Approach for Modeling Embodied Cognition 5.1 5.2 5.3 5.4 5.5 5.6
Dynamical Systems Gibsonian and Neo- Gibsonian Approaches Behavior-Based Robotics Modeling the Brain at Different Levels Neural Network Models Neurorobotics from the Dynamical Systems
Perspective 5.7 Summary
43 44 55 69 75 77
81 83 93 103 109 112 125 136
Part II Emergent Minds: Findings from Robotics Experiments
6. New Proposals 6.1 Robots with Subjective Views 6.2 Engineering Subjective Views into Neurodynamic Models 6.3 The Subjective Mind and the Objective World as an In separable Entity
148
7. Predictive Learning About the World from Actional Consequences
151
7.1 Development of Compositionality: The Symbol Grounding Problem 7.2 Predictive Dynamics and Self-Consciousness 7.3 Summary
141 141 143
152 161 172
Contents
8. Mirroring Action Generation and R ecognition with Articulating Sensory– Motor Flow 8.1 A Mirror Neuron Model: RNNPB 8.2 Embedding Multiple Behaviors in Distr ibuted Representation 8.3 Imitating Others by Reading Their Mental States 8.4 Binding Language and Action 8.5 Summary 9. Development of Functional Hierarchy for Action 9.1 Self-Organization of Functional Hierarchy in Multiple Timescales 9.2 Robotics Experiments on Developmental Training of Complex Actions 9.3 Summary 10. Free Will for Action and Conscious Awareness
vii
175 177 180 182 190 196 199 203 209 216 219
10.1 A Dynamic Account of Spontaneous Behaviors
219
10.2 Free Will, Consciousness, and Postdiction 10.3 Summary
230 239
11. Conclusions 11.1 11.2 11.3 11.4 11.5
Compositionality in the Cognitive Mind Phenomenology Objective Science and Subjective Experience Future Directions Summary
243 243 247 251 255 262
Glossary for Abbreviations
269
References Index
271 289
Foreword Frank E. Ritter
This book describes the background and results from Jun Tani’s work of over a decade of building robots that think and learn through interaction with the world. It has numerous useful and deep lessons for modelers developing and using symbolic, subsymbolic, and hybrid architectures, so I am pleased to see it in the Oxford Series on Cognitive Models and Architectures. It is work that is in the spirit of Newell and Simon’s (1975) theory of empirical exploration of computer science topics and their work on generation of behavior, and also takes Newell and Simon’s and Feynman’s motto of understanding through generation of behavior seriously. At the same time, this work extends the physical symbol hypothesis in a very useful way by suggesting by example that the symbols of human cognition need not be discrete symbols manually fed into computers (which we have often done in symbolic cognitive architectures), but can instead be composable neuro-dynamic structures arisi ng through iterative learning of perceptual experience with the physical world. Tani’s work has explored some of the deep issues in embodied cognition, about how interaction with the environment happens, what this means for representation and learning, and how more complex behavior can be created or how it arises through more simple aspects. These lessons include insights about the role of interaction with the environment, consciousness and free will, and lessons about how to build neural net architectures to drive behavior in robots.
ix
x
Foreword
The book starts with a review of the foundations of this work, including some of the philosophical foundations in this area (including the symbol grounding problem, phenomenology, and the role of time in thinking). It argues for a role of hierarchy in modeling cognition, and for modeling and understanding interaction with an external world. The book also notes that state space attractors can be a useful concept in understanding cognition, and, I would add, this could be a useful additional way to measure fit of a model to behavior. This review also reminds us of areas that current symbolic models have been uninformed by—I don’t think that these topics have been so much ignored as much as put on a list for later work. These aspects are becoming more timely, as Tani’s work shows they can be. The review chapters make this book particularly useful as an advanced textbook, which Tani already uses it for. Perhaps more importantly, in the second half of the book (Chapters 6 to 11) Tani describes lessons from his own work. This work argues that behavior is not always programmed or extant in a system, but that it can or often should arise in systems attempting to achieve homeostasis— that there are positions of stability in a mental representation (including modeling others, imitation), and that differences in knowledge between the levels can give rise to effect s that might be seen to be a t ype of consciousness, a mental trace of what lower levels should do or are doing, or explanations of what they have done based on predictions of the agent’s own behavior, a type of self-reflexive mental model. These res ults suggest that more models should model homeostasis and include more goals and knowledge about how to achieve it. His work provides another way of representing and generating behavior. This way emphasizes the dynamic behavior of systems rather than the data structures used in more traditional approaches. The simple ideas of evolution of knowledge, feedback, attractors, and further concepts provide food for thought for all systems that generate behavior. These components are reviewed in the first part of the book. The second part of the book also presents several systems used to explore these ideas. Lessons from this book could and should change how we see all kinds of cognitive architectures. Many of these concepts have not yet been noticed in sy mbolic architectures, but they probabl y exis t in t hem. This new way to examine behavior in architectures has provided insights already about learning and interaction and consciousness. Using these concepts in exi sting architectu res and models will provide n ew insights
Foreword xi
into how compositional thoughts and actions can be generated without facing the notorious problems of the symbol grounding problem or, ultimately, the mind–body problem. In his work about layers of representation, he has seen that higher levels might not just lead the lower levels, but also follow them, adjusting their own settings based on the lower levels’ behavior. An interpretation of the higher levels trying to follow or predict the lower levels provides a potential computational description and explanation of some forms of consciousness and free will. I found these concepts particularly intriguing. Not only that higher levels could follow and not lead lower levels, but that the mismatch could lead to a kind of postdiction in which intention becomes consciously aware after action. We might see this elsewhere as other architectures, their environments, and their interaction with the environment become more complex, and indeed should look for it. I hope you find the book as useful and suggestive of new areas of work and new aspects of behavior to consider for including in architectures as I have.
Preface
The mind is ever elusive, and imagining its underlying mechanisms remains a constant challenge. This book attempts to show a clear picture of how the mind might work, based on tangible experimental data I have obtained over the last two decades during my work to construct the minds of robots. The essential proposal of the book is that the mind is comprised of emergent phenomena , which appear via intricate and often conflictive interactions between the top- down subjective view for proactively acting on the external world and the bottom-up recognition of the resultant perceptual reality. This core idea can provide a scaffold to account for the various fundamental aspects of the mind and cognition. Allowing entangling interactions between the top- down and bottom-up processes means that the skills we need to generate complex actions, knowledge, and concepts for representing the world and the linguist ic competency we need to express our exper iences can naturally develop—and the cogito1 that allows this “compositional” yet fluid thinking and action appears to be embedded in dynamic neural s tructures. The crucial argument here is that this cogito is free from the problems inherent in Cartesian dualism, such as that of interaction and how a nonmaterial mind can cause anything in a material body and world, and vice versa. We avoid such problems because the cogito embedded
1. Cogito is from a Latin philosophical proposit ion by Rene Descartes “Cog ito ergo sum,” which has been translated as “I think, therefore I am.” Here, cogito denotes a subject of cognizing or thinking.
xiii
xiv
Preface
in the continuous state space of dynamic neural systems is also matter, rather than nonmatter composed of a discrete symbol system or logic. Therefore, the cogito can interact physically with t he external world: As one side pushes forward a little, the other side pulls back elastically so that a point of compromise can be found in conflictive situations through iterative dynamics. It is further proposed that even the nontrivial problem of consciousness (what David Chalmers has called the hard problem
of consciousness ) and free will can become accessible by considering that consciousness is also an emergent phenomenon of matter arising inevitably from such conflictive interactions. The matter here is alive and vivid in never-ending trials by the cogito to comprehend an ever-changing reality in an open- ended world. Each of these statements—my proposals on the workings of the mind— will be examined systematically by reviewing multidiscip linary discussions, largely from the fields of neuroscience, phenomenology, nonlinear dynamics, psychology, cognitive science and cognitive robotics. Actually, the book aims for a unique way of understanding the mi nd from rather an unordinary but inspir ing combination of ingredients such as humanoid robots, Heidegger’s philosophy, deep learning neural nets, strange attractor from chaos theory, mirror neurons, Gibsonian psychology, and more. The book has been written with a multidisciplinary audience in mind. Each of the chapters start by presenting general concepts or tutorials on each discipline— cognitive science, phenomenology, neuroscience and brain science, nonlinear dynamics, and neural network modeling—before exploring the subjects specifically in relation to the emergent phenomena which I believe constitute the mind. By providing a brief introduction to each topic, I hope that a general audience and undergraduate students with a s pecific interest in t his subject will enjoy reading on to the more technical aspects of the book that describe the neurorobotics experiments. I have debts of gratitude to many people. First of all, I thank Jeffrey White for plenty of insightful advice on this manuscript in regard to its contents, as well as for editing in English and examining every page. I would like to commend and thank all members of my former laboratory at RIKEN as well as of the current one in the Korean Advanced Institute of Science and Technology (KAIST) who, over the years, have contributed to the research described in this book. I am lucky to have many research friends with whom I can have in-depth discussions about shared interests. Takashi Ikegami has been one of the most inspiring. His
Preface xv
stroke of genius and creative insights on the topics of life and the mind are irreplaceable. I admit that many of my research projects described in this book have been inspired by thoughtful discussions with him. Ichiro Tsuda provided me deep thoughts about possible roles of chaos in the brain. The late Joseph Goguen and late Francisco Varela generously offered me much advice about the links between neurodynamics and phenomenology. Karl Friston has provided me thoughtful advice in the research of our shared interests on many occasions. Michael Arbib offered insight into the concept of action primitives and mirror neuron modeling. He kindly read my early draft and sent it to Oxford University Press. I have been inspired by frequent discussions about developmental robotics with Minoru Asada and Yasuo Kuniyoshi. I would like to express my gratitude and appreciation to Masahiro Fujita, Toshitada Doi, and Mario Tokoro of Sony Corporation who kindly provided me with the chance to start my neurorobotics studies more than two decades ago in an elevator hall in a Sony building. I must thank Masao Ito and Shun-ichi Amari at RIKEN Brain Science Institute for their thoughtful advice to my research in general. And, I express my gratitude for Miki Sagara who prepared many figures. I am grateful to Frank Ritter as the Oxford series editor on cognitive models and architectures who kindly provided me advice and suggestions from micro details to macro levels of this manuscript during its development. The book could not have been completed in the present form without his input. I wish to thank my Oxford University Press editor Joan Bossert for her cordial support and encouragement from the beginning. Finally, my biggest thanks go to my wife, Tomoko, who professionally photographed the book’s cover image; my son, Kentaro; and my mother, Harumi. I could not have completed this book without their patient and loving support. This book is dedicated to the memory of my father, Yougo Tani, who ignited my interest in science and engineering before he passed away in my childhood. Some additional resources such as robot videos can be found at https://sites.google.com/site/tanioupbook/ home. Finally, this work was partially supported by RIKEN BSI Research Fund (2010-2011) and the 2012 KA IST Set tlement and Research of New Instr uctors Fund, titled “Neuro- Robotics Experiments with Large Scale Brain Networks.”
Part I On the Mind
1 Where Do We Begin with Mind?
How do our minds work? Sometimes I notice that I act without much consciousness, for example, when reaching for my mug of coffee on the table, putting on a jacket, or walking to the station for my daily commute. However, if something unexpected happens, like I fail to grasp the mug properly or the road to the station is closed due to roadwork, I suddenly become conscious of my actions. How does this consciousness arise at such moments? In everyday conversation, my utterances are generated smoothly. I automatically combine words in the correct order and seldom consciously manipulate grammar when speaking. How is this possible? Although it seems that many of our thoughts and actions are generated either consciously or unconsciously by utilizing knowledge or concepts in terms of images, rules, and symbols, I wonder how they are actually stored in our memories and how they can be manipulated in our minds. When I’m doing something like making a cup of coffee, my actions as well as thoughts tend to shift freely from gett ing out the milk to looking out the window to thinking about whether to stay in for lunch today. Is this spontaneous switching generated by my will ? If so, how is such will initiated in my mind in the first place? Mostly, my everyday thinking or action follows routines, habituation, or social conventions. Nevertheless, sometimes some novel images, thoughts, or acts can be created. How are they generated? Finally, a somewhat philosophical question arises: How can I believe that this world really exists 3
4
On the Mind
without my subjectively thinking about it? Does my subjective mind subsume the reality of the world or is it the other way around? The mind is one of the most curious and miraculous things. We know that the phenomena of the mind, like those just described, srcinate in the brain: We often hear scientists saying that our minds are the products of “entangled” activities of neurons firing, synapse modulations, neuronal chemical reactions, and more. Although the scientific literature contains an abundance of detailed information about such biological phenomena in the brain, it is still difficult to find satisfactory explanations about how the mind actually works. This is because each piece of detailed knowledge about the biological brain cannot as yet be connected together well enough to produce a comprehensive picture of the whole. But understanding the mind is not only the remit of scientists; it is and has always been the job of philosophers, too. One of the greatest of philosophers, Aristotle, asserted that “The mind is the part of the soul by which it knows and understands” (Aristotle, Trans. 1907). It is hard, however, to link such metaphysical arguments to the actual biological reality of the brain. Twenty-five years ago, I was a chemical plant engineer with no such thoughts about the brain, consciousness, and exis tence until something wonderful happened by chance to sta rt me thi nking about these things seriously. One day I traveled to a chemical plant site in an isolated area in northern Japan to examine a hydraulic system consisting of piping networks. The pipeline I saw there was huge, with a diameter of more than 1.5 m and a total length of around 20 km. It srcinated in a ship yard about 10 km away from the plant and inside the plant yard it was connected to a complex of looping networks equipped with various functional components such as automatic control valves, pumps, surge accumulators, and tank s. I was conducting an emergency shutdown test of one of the huge main valves downstream in the pipeline when, immediately after valve shutdown, I was terrified by the thundering noise of the “water hammer” phenomenon, the loud knocking heard in a pipe caused by an abrupt pressure surge upstream of the valve. Several seconds later I heard the same sound arising from various locations around the plant yard, presumably because the pressure surge had propagated and was being reflected at various terminal ends in the piping network. After some minutes, although the initial thunderous noise had faded, I noticed a strange coherence of sounds occurring across the yard. I heard “a pair”
Where Do We Begin with Mind? 5
of water hammers at different places, seeming to respond to each other periodically. This coherence appeared and disappeared almost capriciously, arising again in other locations. I went back to the plant control room to examine the operation records, p lotting the ti me history of t he internal pressure at various points in the piping network. As I thought, the plots showed some oscillatory patterns of pressure hikes appearing at certain points and tending to transform to other oscillatory patterns within several minutes. Sometimes these patterns seemed to form in a combinatory way, with a set of patterns appearing in different combinations with other sets. At that point I jumped on a bicycle to search for more water hammers around the plant yard even though it was already dusk. Hearing this mysterious ensemble of roaring pipes in the da rkness, I felt as if I was exploring inside a huge brain, where its consciousness arose. In the next moment, however, I stopped and reflected to myself that this was not actually a mystery at all but complex transient phenomena involving physical systems, and I thought then that this might explain the spontaneous nature of the mi nd. I had another epiphany several months later when, together with my fellow engineers, I had the chance to visit a robotics research laboratory, one of the most advanced of its kind in Japan. The researchers there showed us a sophisticated mobile robot that could navigate around a room guided by a map preprogrammed into the robot’s computer. During the demonstration the robot maneuvered around the room, stopped in front of some objects, and said in a synthesized voice, “This is a refrigerator,” “This is a blackboard,” and “This is a couch.” While we all stood amazed at seeing the robot correctly naming the objects around us, I asked myself how the robot could know what a refrigerator meant. To me, a refrigerator means the smell of refreshing cool air when I open the door to get a beer on a long hot summer day. Surely the robot didn’t understand the meaning of a refrigerator or a chair in such a way, as these items were nothing more to it than landmarks on a registered computational map. The meanings of these items to me, however, would materialize as the result of my own experiences with them, such as the smell of cool air from the refrigerator or the feeling of my body sinki ng back into a soft chair as I sit down to drin k my beer. Surely the meanings of various things in the world around us would be formed in our brains through the accumulation of our everyday experiences interacting with them. In the next moment I started to think about building my own robot, one that could have a subjective mind, experience
6
On the Mind
feelings, imagine things, and think about the world by interacting in it. I also had some vague notion that a subjective mind should involve dynamic phenomena fluttering between the conscious and unconscious, just as with the water hammers that had captured my imagination a few months earlier. Sometime later I went back to school, where I studied many subjects related to the mind and cognition, including cognitive science, robotics, neuroscience, neural network modeling, and philosophy. Each discipline seemed to have its own specific way of understanding the mind, and the way the problems were approached by each discipline seemed too narrow to exchange ideas and views with other disciplines. No single discipline could fully explain what the mind is or how it works. I simply didn’t believe that one day a super genius like Einstein would come along and show us a complete picture of the mind, but rather I suspected that a good understanding, if attainable, would come from a mutual, relational understanding between multiple disciplines, enabling new findings and concepts in one domain to be explainable using different expressions in other d isciplines. It was then it came to me that building robots while taking a multidisciplinary approach could well produce a picture of the mind. The current book presents the outcome of two decades of research under this motivation. * * * This book asks how natural or artificial systems can host cognitive minds that are characterized by higher order cognitive capabilities such as compositionality on the one hand and also by autonomy in generating spontaneous interactions with the outer world either consciously or unconsciously. The book draws answers from examination of synthetic neurorobotics experiments conducted by the author. The underlying motivation of this study differs from that of conventional intelligent robotics studies that aim to design or program functions to generate intelligent actions. The aim of s ynthetic neurorobo tics st udies is to e xamine experimentally t he emergence of nontrivial mindlike phenomena through dynamic interactions, under specific conditions and for various “cognitive” tasks. It is like examining the emergence of nontrivial patterns of water hammer phenomena under the specific operational conditions applied in complex pipeline networks.
Where Do We Begin with Mind? 7
The synthetic neurorobotics studies described in this book have two foci. One is to make use of dynamical systems perspectives to understand various intricate mechanisms characterizing cognitive minds. T he dynamical systems approach has been known to be effective in articulating mechanisms underlying the development of various functional structures by applying the principles of self- organization from physics (Nicolis & Prigogine, 1977; Haken, 1983). Structures and functions to mechanize higher order cognition, such as for compositional manipulations of “symbols,” concepts, or linguistic thoughts, may develop by means of self- organization in internal neurodynamic systems via the consolidative learning of experience. The other focus of these neurorobotics studies is on the embodiment of cognitive processes crucial to understanding the circular causality arising bet ween body and environment as aspects of mind extend beyond the brain. This naturally brings us to the distinction between the subjective mind and the objective world. Our studies emphasize top-down intentionality on the one hand, by which our own subjective images, views, and thoughts consolidated into structures through past experience are proactively projected onto the objective world, guiding and accompanying our actions. Our studies a lso emphasize bottom-up recognition of the perceptual reality on the other hand, which results in the modification of top-down intention in order to minimize gaps or errors between our prior expectations and actual outcomes. The crucial focus here is on the circular causality that emerges as the result of iterative interactions between the two processes of the top- down subjective intention of acting on the objective world and the bottom-up recognition of the objective world with modification of the intention. My intuition is that the key to unlocking al l of the mysteries of the mind, including our experiences of consciousness as well as free wil l, is hidden in this as yet unexplored phenomenon of circular causality and the structure within which it occurs. Moreover, close examination of this structure might help us address the fundamental philosophical problem brought to the fore in mind/ body dualism: how the subjective mind and the objective world are related. The sy nthetic robotics approach described in t his book seeks to answer this f undamental question through the examination of actual experimental results f rom the viewpoints of various disciplines. This book is organized into two part s, namely “ Part I On the Mind ” from chapter 1 to chapter 5 and “ Part II Emergent Minds: Findings from Robotics Experiments ” from chapter 6 to chapter 11. In Part I, the
8
On the Mind
book reviews how probl ems wit h cognitive minds have been e xplored in di fferent res earch fields, i ncluding cognitive science, phen omenology, brain science, neural network modeling, psychology, and robotics. These in- depth reviews will provide general readers with a good introduction to relevant d isciplines and should help them to appreciate the many conflicting arguments about the mind and brain active therein. Part II starts with new proposals for tackling these problems through neurorobotics experiments, and through analysis of their results comes out with some answers to f undamental questions about the nature of the mind. In the end, this book traces my own journey in exploration of the fundamental nature of the mind, and in ret racing this journey I hope to deliver an i ntuitively accessible acc ount of how the mind works.
2 Cognitivism
One of the main forces having advanced the study of the mind over the last 50 years is cognitivism. Cognitivism regards the mind as an externally observable object that can be best a rticulated with s ymbol systems in computational metaphors, and this approach has become successful as the speed and memory capacity of computers has grown exponentially. Let us begin our discussion of cognitivism by looking at the core ideas of cognitive science.
2.1. Composition and Recursion in Symbol Systems The essence of cognitivism is represented well by the principle of compositionality (i.e., the meaning of the whole is a function of the meaning of the parts), but specifically that as expounded by Gareth Evans (1982) in regard to language. According to Evans, the principle asserts that the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them (sentences are composed from sequences of words). However, its central notion that the whole can be decomposed into reusable parts (or primitives) is applicable to other faculties, such as action generation. Indeed, Michael Arbib (1981) in his motor schemata theory, which was
9
10
On the Mind
published not long before Evans’ work on language, proposed that complex, goal-directed actions can be decomposed into sequences of behavior primitives. Here, behavior primitives are sets of commonly used behavior pattern segments or motor programs that are put together to form streams of continuous sensory- motor flow. Cognitive scientists have found a good analogy between the compositionality of mental processes, like combining the meanings of words into those of sentences or combining the images of behavior primitives into those of goal-directed actions “at the back of our mind,” and the computational mechanics of the combinatorial operations of operands. In both cases we have concrete objects— symbols— and distinct procedures for manipulating them in our brains. Because these objects to be manipulated— either by computers or in mental processes—are symbols without any physical dimensions such as weight, length, speed, or force, their manipulation processes are considered to be cost free in te rms of ti me and energy consumption. When such a symbol system, comprising arbitrary shapes of tokens (Harnad, 1992), is provided with recursive functionality for the tokens’ operations, it achieves compositionality with an infinite range of expressions. Noam Chomsky, famous for his revolutionary ideas on generative grammar in linguist ics, has advocated that recursion is a un iquely human cognitive competency. Chomsky and colleagues (Hauser, Chomsky, & Fitch, 2002) proposed that the human brain might host two distinct cognitive competencies: the so-called faculty of language in a narrow sense (F LN) and the faculty of language in a broad sense (F LB). FLB comprises a sensory- motor system, a conceptual-intentional syste m, and the computational mechanisms for recursion that allow for an infinite range of expressions from a finite set of elements. FLN, on the other hand, involves only recursion and is regarded as a uniquely human aspect of language. FLN is t hought to generate internal representations by utilizing syntactic rules and mapping them to a sensory– motor interface via the phonological system as well as to the conceptual–intentional interface via the semantic system. Chomsky and colleagues admit that some animals other than humans can exhibit certain recur sion-like behaviors with trai ning. Chimps have become able to count the number of objects on a table by indicating a corresponding panel representing the correct number of objects on the table by association. The chimps became able to count up to around five objects correctly, but one or two errors creep in for more than five
Cognitivism
11
objects: The more objects to count, the more inaccurate at counting the chimps become. Another example of recursion-like behavior in animals is cup nesting, a task in which each cup varies in size so that the smallest cup fits into the second smallest, which in turn can be “nested” or “seriated” into larger cups. When observing chimps and bonobos cup nesting, Johnson- Pynn and colleagues (1999) found that performance differed by species as well as among individuals; some individuals could nest only two different sizes of cups whereas others could pair three by employing a subassembly strategy, that is, nesting a small cup in a medium size cup as a subassembly and then nesting them in a large cup. However, the number of nestings never reliably went beyond three. Similar limitations in cup nesting performance have been observed in parrots (Pepperberg & Shive, 2001) and the degu, a small rat-size rodent (Tokimoto & Okanoya, 2004). These observations of animals’ object counting and nesting cup behaviors suggest that, although some animals can learn to perform recursion-like behaviors, the depth of recursion is quite limited part icularly when contrasted with humans in whom almost an infinite depth of recursion is possible as long as time and physical conditions allow. Chomsky and colleagues thus speculated that the human brain might be uniquely endowed with the FLN component that enables infinite recursion in the generation of various cognitive behaviors including language. What then is the core mechanism of FLN? It seems to be a recursive call of logical rules. In counting numbers, the logical rule of “add one to the currently memorized number” is recursively called: Starting with the currently memorized number set to 0, it is increased to 1, 2, 3,… , infinity as the “add one” rule is called at each recursion. Cup nesting can be performed infinitely when the logical rule of “put the next smallest cup in the current nesting cup” is recursively called. Similarly, in the recursive structure of sentences, clauses nest inside of other clauses, and in sentence generation the recursive substitution of one of the contextfree grammar rules for each variable could generate sentences of infinite length af ter star ting with the sy mbol “S” (see Figure 2.1 for an illustrative example). Chomsky and colleagues’ crucial argument is that the core aspect of recursion is not a matter of what has been learned or developed over a lifetime but what has been implemented as an innate function in the faculty of language in a narrow sense (FLN). In their view, what is to be learned or developed are the interfaces from this core aspect of recursion
12
On the Mind Sentence generation Context-free grammar
S R
R: S → NP VP R2: NP → (A NP)/N R3: VP → V NP
NP
VP
R2 A R4: A → Small R5: N → dogs/cats R6: V → like
R4 Small
R3 NP R2
V R6
NP R2
N like R5
N R5
dogs
cats
Figure 2.1. On the left is a context-free grammar (CFG) consisting of a set of rules and on the right is an example sentence that can be generated by recursive substitutions of the rules with the star ting symbol “S” al located to the top of the parsing tree. Note that the same CFG can generate different sentences, even those with infinite length, depending on the nature of the substituting ru les (e.g., repeated substitutions of R 2: NP →A NP).
ability to the sensory– motor systems or semantic systems in the faculty of language in a broad sense (FLB). They assert that the unique existence of this core recursive aspect of FLN is an innate component that positions human cognitive capability at the top of the hierarchy of living systems. Such a view is conten tious though. Firs t, it is not realis tic to assu me that we humans perform infinite recursions in everyday life. We can neither count infinitely nor generate/ recognize infinite- length sentences. Chomsky a nd colleagues, however, see this not as a problem of FLN itself but as a problem of external constraints (e.g., a limitation in working memory size in FLB in remembering currently generated word sequences) or of physical time constraints that hamper performing infinite recursions in FLN. Second, are symbols actually manipulated recursively somewhere in our heads when counting numbers or generating/ recognizing sentences? If there are fewer than six objects on a table, the number would be grasped analogically from visual patterns; if there are more than six objects, we may start to count them one by one on our fingers. In our everyday conversations we generally talk without much concern for spoken grammar: Our colloquialisms seem to be generated not by consciously combining individual words following grammatical rules, but by automatically and subconsciously
Cognitivism
13
combining phrases. However , when needing to write complex embedded sentences such as those ofte n seen in formal docu ments, we sometimes find ourselves consciously dealing with grammar in our search for appropriate word sequences. Thus, the notion of there being infi nite levels of recursion in FLN might apply only rarely to human cognition. In everyday life, it seems unlikely that an infinite range of expressions would be used. Many cognitive behaviors in everyday life do still of course require some level of manipulation that involves composition or recursion of information. For example, generating goal-directed action plans by combining behavior primitives into sequences cannot be accounted for by the simple involuntary action of mapping sensory inputs to motor outputs. It requires some level of manipulation of internal knowledge about the world, yet does not involve infinite complexity. How is such processing done? One possibility might be to use the core recursive component of calling logical rules in FLN under the limitation of finite levels of recursions. Another possibility might be to assume subrecursive functions embedded in analogical processes rather than logical operations in FLB that can mimic recursive operations for finite levels. Cognitivism embraces the former possibility, with its strong conviction that the core aspect of cognition should reside in symbol representation and a manipulation framework. But, if we are to assume that symbols play a central role in cognition, how would symbols comprising arbitrary shapes of tokens convey the richness of meaning and context we see in the real world? For example, a typical artificial intelligence system may represent an “apple” with its features “color-is-RED” and “shape-is-SPHERE.” However, this is merely to describe the meaning of a symbol by way of other symbols, and I’m not sure how my everyday experience with apples could be represented in this form.
2.2. Some Cognitive Models This section looks at some cognitive models that have been developed to solve general cognitive tasks by utilizing the aforementioned symbolist framework. The General Problem Solver (GPS) (Newell & Simon, 1972; Newell, 1990) that was developed by Allen Newell and Herbert A. Simon is such a typical cognitive model, which has made a significant impact on the subsequent direction of artificial intelligence research.
14
On the Mind
Numerous systems such as Act- R (Anderson, 1983) and Soar (Laird et al., 1987) use this r ule-based approach, although it has a crucial problem, as is shown later. The GPS provides a core set of operations that can be used to solve cognitive problems in various task domains. In solving a problem, the problem space in terms of the goal to be achieved, the initial state, and the transition rules are defined. By following a means- end-analysis approach, the goal to be achieved is divided into subgoals and GPS attempts to solve each of those. Each transition rule is specified by an action operator associated with a list of precondition states, a list of “add” states and a list of “delete” states. After an action is applied, the corresponding “add” states a nd “delete” states are added to and deleted from the precondition states. A rule actually specifies a possible state transition from the precondition state to the consequent state after applying the action. Let us consider the so- called monkey– banana problem in which the goal of the monkey is to become not hungry by eating a banana. The rules defined for GPS can be as shown in Table 2.1. By considering that the goal is [“not hungry”] and the start state is [“at door,” “on floor,” “has ball,” “hungry,” “chair at door”], it can be seen that the goal state [“not hungry”] can be achieved by applying an action of “eat bananas” in Rule 5 if the precondition state of [“has bananas”] is satisfied. Therefore, this precondition state of [“has bananas”] becomes the subgoal to be achieved in the next step. In the
Table 2.1. Example Rules in GPS Rule# Rule 1
Rule 2
Rule 3 Rule 4
Rule 5
Action “climb on chair”
Precondition “chair at middle room,” “at middle room,” “on floor” “chair at door,” “at door”
“push chair from door to middle room” “walk from door “at door,” to middle room” “on floor” “grasp bananas” “at bananas,” “empty handed” “eat bananas” “has bananas”
Add “at bananas,” “on chair”
Delete “at middle room,” “on floor”
“chair at middle “chair at door,” room,” “middle “at door” room” “at middle room” “at door” “has bananas”
“empty handed”
“empty handed,” “not hungry”
“has bananas,” “hungry”
Cognitivism
15
same manner, the subgoal [“has bananas”] can be achieved by applying an action of [“grasp bananas”] with the precondition of [“at bananas”], which can be achieved again by applying another action of [“climb on chair”]. Repetitions of backward transition from a particular subgoal to its sub-subgoal by searching for an adequate action enabling the transition can result in generation of a chain of actions, and the goal state can be achieved from the start state by applying the resulting action sequence. The architecture of GPS is quite general in the sense t hat it has been applied to a variety of different task domains including proving theorems in logic or geometry, word puzzles, and chess. Allen Newell and his colleagues (Laird et al., 1987) developed a new cognitive model, Soar, by further extending GPS. Of particular interest is its primary learning mechanism, chunking. Chunking is involved in the conversion of an experience of an action sequence into long-term memory. When a particular action sequence is found to be effective to achieve a particular subgoal, this action sequence is memorized as a chunk (a learned rule) in long-term memory. When the same subgoal appears again, this chunked action sequence is recalled rather than deliberating over and synthesizing it again. For example, in the case of the monkey–banana problem, the monkey may learn an action sequence of “grasp bananas” and “eat bananas” as an effective chunk for solving a current “hungry” problem, and may retain this chunk because “hungry” may appear as a problem again in the future. The idea of chunking has at tracted significant attention in cognitive psychology. Actually, I myself had been largely influenced by this idea after I learned about it in an art ificial intelligence co urse given by John Laird, who has led the development of Soar for more than two decades. At the same time, however, I could not arrive at full agreement with the treatment of chunking in Soar because the basic elements to be chunked are symbols rather than continuous patterns even at the lowest perceptual level. I speculated that the mechanism of chunking should be considered at the level of continuous perceptual flow rather than symbol sequences in which each symbol already stands as an isolated segment within the flow. Later sections of this book explore how chunks can be structu red out of continuous sensory– motor flow experiences. First, however, the next section introduces the so-called symbol grounding problem, which cognitive models built on symbolist frameworks inevitably encounter.
16
On the Mind
2.3. The Symbol Grounding Problem The symbol grounding problem as conceptualized by Steven Harnad (1990) is based on his assertion that the meanings of symbols should srcinate from a nonsymbo lic substrate like sensory- –motor patterns and as such, symbols are grounded bottom up. To give shape to this thought, he proposed, as an abstract model of cognitive systems, a hybrid system consisting of a symbol system in the upper level and a nonsymbolic pattern processing system in the lower level. The nonsymbolic pattern processing system functions as the interface between sensory– motor reality and abstract symbolic representation by categorizing continuous sensory– motor patterns into sets of discrete symbols. Harnad argued that meaning, or semantics, in the hybrid system would no longer be parasitic on its symbol representation but would become intrinsic to the whole system operation, as such representation is now grounded in the world. This concept of a hybrid system has similarities to that of FLN and FLB advocated by Chomsky and colleagues in the sense that it assumes a core aspect of human cognition in terms of logical symbol systems, which can support up to an infinite range of expressions, and peripheries as t he interface to a sensory– motor or semantic system that may not be involved in composition or recursion in depth. This idea of a hybrid system reminds me also of Cartesian dualism. According to Descartes t he mind is a think ing thing that is nonmaterial whereas the body is nonthinking matter, and the two are distinct. The nonmaterial mind may correspond to FLN or symbol systems that are defined in a nonphysical discrete space, and the body to sensory–motor processes that are defined in physical space. The crucial question here is how these two completely distinct existences that do not share the same metric space can interact with each other. Obviously, our minds depend on our physical condition and the freshness of the mind affects the swiftness of our every move. Descartes showed some concern about this “problem of interactionism,” asking how a nonmaterial mind can cause anything in a material body, and vice versa. Cognitive scientists in modern times, however, seem to consider—rather optimistically I think—that some “nice” interfaces would enable interactions between the two opposite poles of nonmatter and matter. Let’s consider the problem by examining a problem in robot navigation as an example, reviewing my own work on the subject (Tani, 1998). A typical mobile robot, which is equipped with simple range sensors, may travel around an office environment while taking the range reading that
Cognitivism
17
provides an estimate of geometrical shapes in the surrounding environment at each time step. The continuous flow of the range image pattern is categorized into one of several predefined landmark types such as a straight corridor, a corner, a T-branch, or a room entrance. The upper level constructs a chain representation of landmark types by observing sequential outputs of the categorizer while the robot explores the environment. This internal map consists of nodes representing position states of the robot associated with encountered landmark types and of arcs representing transitions between them associated with actions such as turning to right/left and going straight. This representation takes exactly the same form as a symbolic representation known as a finite state machine (FSM), which consists of a finite number of discrete states and their state transition rules. It is noted that the rule representation in GPS can be converted into this FSM representation by considering that each rule description in GPS can be expanded into two adjacent nodes connected by an ark in FSM. Once the robot acquires the internal map of its environment, it becomes able to predict the next sensation of landmarks on its travels by looking at the next state transition in the FSM. When the actual perception of the landmark type matches the prediction, the robot proceeds to the prediction of the next landmark to be encountered. An illustrative description is shown in Figure 2.2.
C
T
Straight C
Right C
T
C
FSM
“T-Branch”
Straight
categorizer
Right
robot and its environment
t sensory pattern
Figure 2.2. Landmark- based navigation of a robot using hybrid-type architecture consisting of a finite state machine a nd a categorizer. Redrawn from Tani (1998).
18
On the Mind
Problems occur when this matching process fails. The robot becomes lost because the operation of the FSM halts upon receiving an illegitimate symbol/ landmark type. This is my concern about the symbol grounding problem. When systems involve bottom-up and top-down pathways, they inevitably encounter inconsistencies between the two pathways of top-down expectation and bottom- up reality. The problem is how such inconsistencies can be treated internally without causing a fatal error, halting the system’s operations. It is considered that both levels are dually responsible for any inconsistency and that they should resolve any conflict through cooperative processes. This cooperation entails iterative interactions between the t wo sides through which optimal matching between them is sought dynamically. If one side pushes forward a little, the other side should pull back elastically so that a point of compromise can be found through iterative dynamic interactions. The problem here is that the symbol systems defined in a discrete space appear to be too solid to afford such dynamic interactions with the sensory– motor system. This problem cannot be resolved simply by implementing certain interfaces between the two systems because the two simply do not share the same metric space enabling smooth, dense, and direct interactions.
2.4. Context Another concern is how well symbol systems can represent the reality of the world. Wittgenstein once said: “Whereof one cannot speak, thereof one must be silent,” meaning that language as a formal symbol system for f ully expressing philosophical ideas has its li mitations. Not only in philosophy, but in everyday life, too, there is always something that cannot be expressed explicitly. Context, or background, is an example . Context srcinally means discourse t hat surrounds a language unit and that helps to determine its interpretation. In a larger sense, it also means the surroundings that specify the meaning or existence of an event. Spencer-Brown (1969) highlighted a paradox in his attempts to explicitly specify context in his formulation of the calculus of indications. Although details of his mathematical formulas are not introduced here, his statement could be interpreted to mean that indexing the
Cognitivism
19
current situation requires the indexing of its background or context. Because indexing the background requires fur ther indexing of t he background of the background, the operation of indexing situations ends up as an infinite regression. Spencer- Brown wrote that, in this aspect, every observation entails a symbol, an unwritten cross, where the cross operation denotes indexing of the background. Let’s imagine you see a bottle-like shape. This situation can be di sambiguated by specify ing its immediate background (context), namely that a bottle-like shape was seen immediately after you opened the refrigerator door, which means that the bottle is chilled. Further background information that the refrigerator was opened after you went back to your apartment after a long day at work would mean that what you see now is a bottle of chilled beer waiting to be drunk. There is no logical way to terminate this regress ion, yet you can stil l reach for the bottle of beer to dri nk it! Although FLN may have the capability for infinite regression, it is hard to believe that our minds actually engage in such infinite computations. We live and act in the world surrounded or supported by context, which is always implicit, uncertain, and incomplete for us at best. How can a formal symbol system represent such a situation?
2.5. Summary We humans definitely have internal images about our surrounding world. We can extract regularities from our experiences and observations both consciously and unconsciously, as evidenced by the fact that we can acquire language skills involving grammar. Also, we can combine the acquired rules to create new images, utterances, and thoughts. Accounting for this aspect, cognitivists tend to assume that symbols exist to be manipulated in our heads. My question, though, is what is the reality of those symbols we suppose to be in our heads? Is symbol representation and manipulation an operational principle in the cognitive mind? If so, my next questions would be how can symbols comprising arbitrary shapes of tokens interact with sensory– motor reality and how can they access matters involving context, mood, or tacit knowledge that are considered to be difficult to deal with by formal symbol systems? It is also difficult to represent the state of consciousness with them. It is presumably hard to differentiate between doing something
20
On the Mind
consciously and unconsciously in the processes of merely manipulating symbols by following logic. If we attempt to model or reconstruct mind, it should be essential to reconstruct not only rational thinking aspects but also the feelings that accompany our daily experiences such as consciousness as the vivid feeling of qualia characterizing various sensations. But if symbol systems cannot deal with such matters, what would be a viable solution? Indeed, this book proposes an abrupt transition from the aforementioned conventional symbolist framework. The main proposal is t o consider that what we hav e in our brains as “symbol” is not just arbitrary shape of token but dynamic activity of physical matter embedded in continuous spatio- temporal space. Such dynamic act ivity of matter, adequately developed, might enable compositional but vivid and contextual thinking and imaging in our brains. A crucial argument would be that such cognitive minds could be na tural ly situated to the physical world because these two share the same metric space for interaction. The next chapter addresses this very problem from the standpoint of a different discipline, that of phenomenology. The objective of phenomenology is not only to investigate the problem of minds but also to search for how the problems themselves can be constituted from the introspec tive view. Readers will fi nd that the disciplinar y of phenomenology i s quite sy mpathetic to the aforementioned dynamic system view.
3 Phenomenology
Phenomenology srcinated in Europe at the beginning of the 20th century with Edmund Husserl’s study of so- called phenomenological reduction , through which the analysis of the natural world is based purely on the conscious experiences of individuals. As this chapter shows, Husserl’s study subsequently evolved and was extended by the existentialism of Martin Heidegger and the embodiment of Maurice Merleau-Ponty and others. We should also not forget to mention William James, who was born 17 years earlier than Husserl in the United States. Although James is best known as the founder of modern psychology, he also provided numerous essential philosophical ideas about the mind, some of which are quite analogous to Husserl’s phenomenology. In Japan, Kitaro Nishida (1990) developed his srcinal th inking, influenced by Buddhist meditation, which turned out to include ideas with some affinity to those of Husserl and James. Phenomenology asks us to contemplate how the world can exist for us and how such a belief can be constituted from our experiences, by suspending our ordinal assumption that the world exists as a physical fact from the outset. Here, the question of how the world can be constituted i n our subjective reflection might be analogous to the question of how the knowledge of the worl d can be re presented in cognit ive science studies. Phenomenology, however, focuses more on phenomena themselves, through direct perception or pure experience, which has 21
22
On the Mind
not yet been art iculated either by conception or language. For exampl e, a rose exists in our subjectivity as a conscious phenomenon of a particular smell or a particular visual shape, but not by our knowledge of its objective existe nce. This discipline then focuses purely on phenomena and questions the existence of the world from such a viewpoint. However, the di scipline also e xplores the being of cogito ( how cognition arise s) in the hig her level by examining how it can be developed purely through the accumulation of perceptual experiences. Thus, phenomenology asks how cognition is constituted f rom direct perception, a line of questioning deeply related to the later discussions on how robotic agents can develop views or recognition of the world from their own sensory– motor experiences.
3.1. Direct Experience Let us begin by examining what direct experience means i n phenomenology. It is said that Husserl noticed the importance of direct experience when coming across Mach’s perspective (Figure 3.1) (T. Tani, 1998). It is said that Mach drew the picture to represent what he sees with his left eye while closing his right one. From this perspective, the tip of his nose appears to the right of the f rame with his eye socket curving upwards. Although we usually do not notice this sort of perspective, this should represent the direct experience that we then reconstruct in our minds. Husserl considered that an examination of such direct experience could serve as a starting point to explore phenomena. Around the same time, a notable Japanese philosopher, Kitaro Nishida introduced a similar idea in terms of pure e xperience, writing t hat: For example, the moment of seeing a color or hearing a sound is prior not only to the thought that the color or sound is the activity of an external object or that one is sensing it, but also to the judgment of what the color or sound might be. In this regard, pure experience is identical with direct experience (Nishida, 1990, p.3) . For Nishida, pure experience is not describable by language but is transcended: When one directly experiences one’s own state of consciousness, there is not yet a subject or an object… . (Nishida, 1990, p.3)
Phenomenology 23
Figure 3.1. Ernst Mach’s drawing. Source: Wikimedia Commons.
Here, what exactly does this phrase “there is not yet a subject or an object” mean? Shizuteru Ueda (1994), who is known for his studies on Nishida’s philosophy, explains this by analyzing the example utterance, “The temple bell is ringing.” If it is said instead as “I hear the temple bell ringing,” the explication of “I” as the subject conveys a subtle expression of subjective experience at the moment of hearing. In this interpretation, the former utterance is considered to express pure experience in which subject and object are not yet separated by any articulation in the cogito. This analysis is analogous to what Husserl recognized from Mach’s perspective.
3.2. The Subjective Mind and Objective World We might ask, however, how much the phenomena of experience depend on direct perception. Is our experience of perception the same as that of
24
On the Mind
infants in the sense that any knowledge or conception in the cogito does not affect them at all? In answer, we have sensationalism on one side, which emphasizes direct experiences from the objective world, and on the other we have cognitivism, which emphasizes subjective reflection and representation of the world. But how did these conflicting poles of the subjective mind and the objective world appear? Perhaps they existed as one entity srcinally and later split off from each other. Let’s look then at how this issue of the subjective and the objective has been addressed by different phenomenological ideas. In Husserl’s (2002) analysis of the structural relationship between what he calls appearance and that which appears in perceiving an object, he uses the example of perceiving a square, as shown in Figure 3.2. In looking at squarelike shapes in everyday life, despite them having slightly unequal angles, we usually perceive them to be squares with equal right angles. In other words, a square could “appear” with unequal angles in various real situations, when it should have equal right angles in the ideal: in such a case, a parallelogram or trapezoid is the “appearance” and the square is “that which appears” as the result of perception. At this point, we should forget about the actual existence of this square in the physical world because this object should, in Husserl’s sense, exist only through idealization. Whether things exist or not is just a subjective matter rather than an objective one. When things are constituted in our minds, they exist regardless of their actual being. This approach that puts aside correspondence to actual being is called that which appears
Square is perceived
(appearance) Parallelogram
(appearance2) Trapezoid
(appearance3) Parallelogram
Figure 3.2. Husserl’s ideas on the structural relationship between “appearance” and “that which appears” in perceiving a square, as an example.
Phenomenology 25
epoché, or suspension of belief. Husserl considers that direct experience has intentionality toward representation. This intentional process of constituting representation from direct experience actually entails consciousness. Therefore, the phenomena of experience cannot be accounted for only by direct experience at the level of perception, but it must also be accounted for by conscious representation at the level of cogito. Ultimately, it can be said that the phenomena of experiences stand on the duality of these two levels. Incidentally, from the preceding text, readers might speculate that the level of cogito and the level of perception are treated as separate entities in phenomenology. However, phenomenology does not seek to take that direction and instead attempts to explore how the apparent polarity of, for example, the cogito and perception, subjectivity and objectivity, and mind and material, could have appeared from a single unified entity in t he beginning. Although understanding such constitutional aspects of the polarity (i.e., how the polarity developed) continues to be a subject of debate in phenomenology, interesting assumptions have been made about there being some sort of immanence enabling self-development of such structures. For example, Husserl considers how the cogito level of dealing with temporal structure submerged in a stream of experience could emerge from the direct perceptual level, as explained in detail later. Nishida (1990) also considers that the subject and object should be one unified existence rather than taken srcinally as independent phenomena. He, however, argues that the unified existence could have internal contradictions that lead to bifurcation or the division of the unity into the subject and object that we usually grasp. He suggests that the phenomenological entity simply continues to develop by repeating these unification and division processes. Merleau-Ponty (1968) professes that this iteration of unification and division would take place in the medium of our bodies, as he considers that the two poles of the subjective mind and the objective material actually meet and intermingle with each other there. He regards the body as ambiguous, being positioned between the subjective mental world and the objective physical world. Heidegger, on the other hand, devoted himself to exploring a more fundamental problem of being by working on what it means to be human rather than splitting the problem into that of subject and object. And through his approach to the problem of being he turned out to be successful in showing how subjectivity and objectivity can appear.
26
On the Mind
What follows examines the philosophical arguments concerning the subjective mind and the objective world in more depth, along with related discussions that include time perception as propounded by Husserl, being-in-the-world set forth by Heidegger, embodiment by Merleau-Ponty, and the stream of consciousness by James. Let’s begin by looking closely at each of these, starting with Husserl’s conception of the problem of time perception.
3.3. Time Perception: How Can the Flow of Subjective Experiences Be Objectified? To Husserl, the world should consist of objects that the subject can consciously meditate on or describe. However, he noticed that our direct experiences do not srcinate with forms of such consciously representable objects but arise from a continuity of experience in time that exist s as pure experience. Analyzing how a continuous flow of experience can be articulated or segmented into describable objects or events brought him to the problem of time perception. Husserl asks how we perceive temporal structure in our experiences (Husserl, 1964). It should be noted that “time” discussed here is not physical time having dimensions of seconds, minutes, and hours but rather time perceived subjectively without objective measures. The problem of time perception is a core issue in this book because both humans and robots that generate and recognize actions have to manage continuous flows of perception by articulating them (via segmentation and chunking), as is detailed later. In considering the problem, Husserl presumed that time consists of two levels: so- called preempirical time at a deep level and objective time at a surface level. According to him, the continuous flow of experience becomes articulated into consciously accessible events by its development though these phenomenological levels. This idea seems born from his thinking on the structural relationship between “appearance” and “that which appears” mentioned earlier in this chapter. At the preempirical level, every experience is implicit and yet must be articulated, but there is some sort of passive intention toward the flow of experience which he refers to as retention and protention. His famous explanatory example is about hearing a continuous melody such as “do-re-mi.” When we hear the “re” note, we would still perceive a lingering impression of
Phenomenology 27
“do” and at the same time we would anticipate hearing the next note of “mi.” The former refers to retention and the latter protention. The present appearance of “re” is called the primar y impression. These three terms of retention, primary i mpression, and protention are used to designate the experienced sense of the immediate past, the present, and the immediate future, respectively. They are a part of automatic processes and as such cannot be monitored consciously. The situation is similar to that of the utterance “The temple bell is ringing” mentioned earlier, in the sense that the subject of this utterance is not yet consciously reflected. Let’s consider the problem of nowness in the “do- re-mi” example. Nowness as experienced in this situation might be taken to correspond with the present point of hearing “re” with no duration and nothing beyond that. Husserl, however, considered that the subjective experience of nowness is extended to include the fr inges of the experienced sense of both the past and the futu re, that is, in ter ms of retention and protention: Retention of “do” and protention of “mi” are included in the primary impression of hearing “re.” This would be true especially when we hear “do-re-mi” as the chunk of a familiar melody rath er than as a sequence consisting of independent notes. Having now understood Husserl’s notion of nowness in terms of retention and protention, the question arises: Where is nowness bounded? Husserl seems to think t hat the immediate past does not belong to a representational conscious memory but merely to an impression. Yet, how could the immediate past, experienced just as an impression, slip into the distant past but still be retrieved through conscious memory, as Francisco Varela (1999) once asked in the context of neurophenomenology? Conscious memory of the past actually appears at the level of objective time, as described next. This time, let’s consider remembering hearing the slightly longer sequence of notes in “do-re-mi-fa-so-la.” In this situation, we can recall hearing the final “la” that also retains the appearance of “so” by means of retention, and we can also recall hearing the same “so” that retains the appearance of “fa,” and so on in order back to “do.” By means of consciously unifying immediate pastness in a recall with presentness in the next recall i n the retention train, a sense of objective time emerges as a natural consequence of organizing each appearance into one consistent linear sequence. In other words, objective time is constituted when the srcinal experience of continuous flow (in this case the melody) is articulated into a sequence of objectified events (the notes) by means of consciously recalli ng and unif ying each appearanc e. There is a fundamental
28
On the Mind
difference between an impression that will sink into the horizon in preempirical time and the same past which is represented in objective time. The former is a present, living experience of passing away, whereas the latter can be constituted as consciously represented or manipulable objects, but only af ter the srci nal experience is retai ned. Therefore, the latter may lack the pureness or vividness of the srcinal experience, yet may fit well with Husserl’s goal that pure experience can be ultimately represented as logical forms dealing with discrete objects a nd events. Husserl proposed two types of intentionality, each heading in a different direction: transversal intentionality refers to integration of the living-present experience by means of retention, primary impression, and protention in preempirical time; longitudinal intention ality affords an immanence of time str uctures (from preempirical time to objective time) by means of conscious recall of retained events in the retention train. Consequently, this intentionality might be considered to be retention of retention itself (a reflective awareness of this experience). In the process of interweaving these two intentionalities (double intentionality) into the unitary flow of consciousness, the srcinal pure experience is objectified and si mulataneously the subjectivity or ego of this objectifying process emerges. In his later years, Husserl introduced an analysis at an even deeper level, the absolute flow level. Here, neither retention nor protention has yet appeared— only flow exists. However, this flow is not homogeneous; each appearance has its own duration. Tohru Tani (1998) interpreted this aspect by saying that consci ousness flows as well as stagnates, characterizing the uniqueness of the absolute flow of consciousness and setting it apart from consciousness as developed elsewhere . This alternating flow and stagnation is primordial, an absolutely given dynamic which is nonreducible. The passive intentional acts of retention and protention that dimensionalize experience along the continuum of temporality in the next level srcinate from this primordial stage of consciousness, and objective time ari ses from there. In sum, Husserl’s persistent drive to reduce the ideas and knowledge of man to direct experiences is admirable. However, his motivation toward a logically manipulable ideal representation of the world via reflection seems to me problematic in that it has exactly the same problem as the symbol grounding problem in cognitive models. Dreyfus (Dreyf us & Dreyfu s, 1988), who is well known for his criticism of art ificial intelligence research, argues that the main computational scheme
Phenomenology 29
based on logical inferences and categorical representation of knowledge in modern cognitive science or artificial intelligence srcinated from the ideas of Husserl. Actually Husserl (1970) had already toyed with an idea similar to the frame system, a notable invention of Marvin Minsky, which introduced domain specificity into the logical descriptions of objects and the world, but he finally admitted defeat in the face of infinite possibilities of situations or domains. However, Heidegger as a disciple of Husserl actually took an alternative route to escape this predicament, as we discover next.
3.4. Being-in-the-World Heidegger is considered by many to be one of the greatest philosophers of modern times, changing the direction of philosophy dramatically by introducing his think ing of existe ntialism (Drey fus, 1991). Although a disciple of Husserl, once he became inspired by his own thoughts on the subjective constitution of the world, Heidegger subsequently departed from Husserl’s phenomenology. It is said that noticed a philosophical problem concerning the cogito and Heidegger consciousness, a problem that was considered by Descartes as well as Husserl and yet fully overcome by neither. Descartes considered that the cogito, a unique undoubtable being, should be taken as the initial point of any philosophical thoughts after everything in the world is discarded for its doubtfulness of being. He concluded that if he doubted, then something or someone must be doing the doubting, therefore the very fact that he doubted proved his existence (Williams, 2014). Husserl, taking on this thought, presented his idea that the world and objects should exist in terms of conscious representations in the cogito and that such conscious representations ought to be ideal ones. Heidegger just could not accept the unconditional prior existence of the cogito. Nor could he accept an ideal and logical representation of the world that the cogito supposedly constitutes. Instead, he raised the more fundamental question of asking what it means to be human, while avoiding tackling directly the problems of cogito versus perception, subjectivity versus objectivity, and mental versus material. It is important to note that Heidegger sought not to obtain an objective understanding of the problem but rather to undertake a hermeneutic analysis of it.
30
On the Mind
Hermeneutics is an approach that attempts to deepen the understanding of targets while having prior estimates, or biases, of them that are adequately modified during the process of understanding. For example, when we read a new piece of text, a preunderstanding of the author’s intention would help our understanding of the content as we go along. However, hermeneutics possesses an inherent difficulty because preunderstanding (bias) srcinates from intuitions in a contextdependent way and there is a potential danger of being caught up in a loop of interpretation, the so- called hermeneutic circle. Because we can understand the whole in terms of its parts and the parts only through their relationship to the whole, we experience an unending interpretative loop. Despite this difficulty, Heidegger holds that there are some fundamental problems, like what it means to be human, which can only be understood in this way. It is said that we take being as granted, but cannot articulate it precisely when asked to do so. In his classic text, Being and Time, he attempts to elucidate the meaning of being via hermeneutic cycling, beginning w ith this same vague preunderstanding. It is his thoughts on understanding by hermeneutic cycling that form the essential philosophical background to the central theme of this book, namely emergent phenomena, as discussed later. For now, let’s examine Heidegger’s famous notion of being-in-theworld (Heidegger, 1962) by looking at his interpretation of the ways of being in relation to equipment. Heidegger focuses on the purposeful exercise of naive capacities as extended by equipment and tools. For example, he asks what it means that a hammer exists. It is not sufficient to answer that it exists as a thing made from cast iron and wood because such an answer merely describes its objective features. Rather, the meaning of being of a hammer must be approached by way of its employment in daily activities, something li ke “T he carpenter building my house is hitting nails with it.” Such an account of nails being hit with a hammer, the hammer being used by the carpenter, and the carpenter building my house implies the way of presently being for each of these entities as situated among others: None exist independently but are unified in their existence via the preunderstanding of how each interacts in the const itution of a situation characterized by purposeful activity. Heidegger asserts that the being of equipment is mostly “transparent.” Put another way, the existence of pieces of equipment is not noticed much in our daily usage of them. When a carpenter continues to hit a nail, the hammer becomes transparent to him: The hammer and
Phenomenology 31
the nail are absorbed in a connected structure, the purposeful activity that is house building. However, when he fails to hit the nail correctly, the unified st ructure breaks down and the independ ence of each entity becomes noticeable. Their relative meanings become interpretable only in such a breakdown. In the breakdown of the once-unified structure, the separated entities of the subject and the object become apparent with self-questioning, like “why did ‘I’ fail?” and “what’s wrong with the hammer and the nail?”. In this way, it is considered that the hermeneutic approach can provide an immanent understanding of metaphysical existence, such as consciousness, through cycles of self- corrective analysis. Heidegger recognizes that ordinary man has rare opportunities to reflect on the meaning of his own way of being, occupied as he is with the daily routines of life, and he regards such a state of being as inauthentic. Although man can live in his neighboring community occupied with “idle talk” and trivia, he cannot become individuated, ultimately recognizing and taking responsibility for his or her existence in such a way. Man in this case lives his daily life only in the immediate present, vaguely anticipating the future and mostly forgetting the past. However, Heidegger tells us that this way of being can be changed to authentic being when man thinks positively about the possibility of his death, which could occur at any moment and not necessarily so very far into the future. Death is an absolutely special event because it is the ultimately individuating condition that cannot be shared with others. Death is to be regarded as the absolutely certain impossibility of being further related to any other kind of being, and when confronted in this way it prompts the authentic being to head toward its own absolute impossibility. Here, we must focus on Heidegger’s brilliant notion that the present is born via the dynamic interplay between a unique agent’s projected future possibilities and its past. In this process, one reclaims one’s self from the undifferentiated flow of idle chatter and everyday routine. This is authenticity. The authentic agent has the courage to spend the time of his or her life in becoming an agent of change, to transform the situation in which one is “thrown” (the clearing of being as inherited, as one is born into it) into that ideal self-situation that characterizes one’s unique potential. The inauthentic agent hides from this potential, and rather invests his or her time in distractions, “idle chatter,” merely repeating established routines and defending established conventions regardless
32
On the Mind
of suboptimal and even grossly immoral results. T hus, Heidegger establishes the subjective sense of temporality (rather than objective time) as the ground of authentic being, whereas the inauthentic being tries to nullify this subjective sense of time by ignoring his or her mortality and retreating into the blind habit and routine that is characteristic of “fallenness.” Now, we see that his notion of temporality is drastically different from that of Husserl’s. Husserl considers that temporality appears as the result of subjective reflection to articulate direct experiences of sensory streams as consciously manipulable object or event sequences. Heidegger, on the other hand, shows how the differentiable aspects of past, present, and f uture rise from the mortal condition. Temporality is the dynamic structu re of being, in light of which anything at all comes to matter, and from which any inquiry into the nature of being, including any derivative understanding of time as sequence, for example, is ultimately drawn. Next, there are other aspects of mind to review, including the role of the body in mediating interactions between the mind and the material world.
3.5. Embodiment of Mind In the philosophy of embodiment developed by Merleau-Ponty, we can easily find the influence of Heidegger’s being-in-the-world. MerleauPonty’s notion of embodiment has been recognized as a notion of ambiguity, and with this a mbiguity he successfu lly avoided tackling Car tesian dualism directly. As mentioned before, Descartes thought that the world consists of two extremes— the subjective mind of nonmaterial and the objective things of materials —and this invited a problem of interaction. The problem is how to account for the causal interaction among nonmaterial mind, material body, and material world while these effectively exist in different spaces. From this background, Merleau- Ponty developed his thoughts on embodiment, asking at which pole the body, as a part of our being, should be taken to lie. Although the body in terms of flesh can be regarded as material, we actually often experience the body as an aspect of mind, which is regarded as nonmaterial by Descartes. For example, our cheeks turn red when we get angry and tears start to fall when we feel sad. In response, Merleau-Ponty proposes that we consider
Phenomenology 33
the body to be an ambiguous existence belonging to neither of these two extremes. Merleau-Ponty examined various means of interaction between mind and body. For example, he presented an analysis of a blind man with a stick (Merleau- Ponty, 1962). The stick becomes an object when the blind man grasps it in order to guide his movements. At the same time, however, it becomes a part of his body when he scans his surroundings when walking by touching its tip to things, like tactile scanning with the finger. Although this is an interesting example showing the possibility of body extension, it also recalls the possibility that the range of self can be extended or shrunk through the use of tools and artifacts. In another example, his analysis of the phenomenon of phantom limbs might indicate the complete opposite of the blind man’s case. It is said that people who have had a limb amputated often still experience pain in the amputated limb that no longer exists. Merleau-Ponty explained the phenomena in terms of “refusal of deficiency,” which is the implicit negation of what runs counter to the natural momentum that throws us into our tasks, our cares, our situation, and our familiar horizons (Merleau-Ponty, 1962). It can be summarized then that the analysis of the blind man with his s tick indicates the possibility of exten sion of the familiar horizon associated with daily use of t he stick, whereas the analysis of phantom limbs indicates another possibility, one of refusal of the sudden shrinking of this once familiar horizon. These examples might help us to understand how the horizon of subjective possibility is constituted via daily interactions between the body and the world, thereby enriching our understa nding of being in the world. Along the same line of the thought, Merleau-Ponty addressed the problem of body schema—the integrated image of the body—by conducting an analysis of a patient with neurological blindness. His patient, Schneider, had lesions in vision-related cortical areas. Although he had a problem in recognizing objects visually, he could pick up a cup to have a drink or make a fire by striking a match without problems, “seeing” the cup or the match. So, he could see the shapes and outlines of objects but needed a reasoning process to identify them. When he was asked to point to his nose, he had difficulty doing so, but could blow his nose with a handkerchief. He had difficulty pointing to or moving a part of his body when asked to do so unless he deliberated over his movement from an objective view ahead of time. In short, he could perform concrete movements in natural situations in daily life very easily,
34
On the Mind
but had difficu lty performi ng abstract movements without context and without an objective view. Merleau-Ponty came to the conclusion that such concrete movements situated in everyday life are fundamental to the consideration of body schema. In concrete movements, our body or body part is not an object that we move in an objective space. Rather, it is our living body, the body as a subject, that we move in a bodily space. These movements performed by our living body are organized in familiar situations in the world, wherein the body comprehends its world and objects without explicitly represen ting or objectif ying them. T he body communicates with them through a skill or tacit knowledge, by making a direct reference to the world and its objects. This direct reference implies the fundamental structure of being- in-the-world, as Heidegger discussed in terms of Dasein. Merleau-Ponty’s (1962) analysis of synesthesia is also worth introducing here. Synesthesia, a neurological condition in which sensation in one modality unconsciously evokes perception in another, has been reported in a variety of forms. Some synesthetes perceive colors upon seeing certain shapes or letterforms, feel textures on hearing particular sounds, or experience strong tastes on hearing certain words. Merleau-Ponty speculates that these sensory slips should have some clear meaning behind them, rather than simply being perceptual side effects, which would accountfor how we humans engage in the world. Indeed, perception of objects in the world is achieved in the iterative interactions between multiple modalities of sensation by reentrant mechanisms established in the coupling of us and the world. Merleau-Ponty refutes ordinary scientific views of modularity to understand the reality of perception by reducing it into the sum of each modality of sensation. His approach is to see perception as ongoing structuring processes of the whole, or Gestalt, which appears in the communicative exchanges between the different modalities of sensation. He also refutes the notion of separating perception from action. He explains that the hand touching something reverses into an object that is being touched because the hand itself is tangible flesh. In shaking hands, we feel that we are touching another’s hand and simultaneously that our extended hand is being touched. Analogously, MerleauPonty says that a see-er reverses into a visible object because of the thickness of its flesh. Thus, vision is analogous to exploring objects in the dark by tactile palpation. Visual palpation by looking inevitably accompanies a sense of being seen at the same time. He writes that painters often feel as if the objects in their own paintings gaze back at them. There are silent exchanges
Phenomenology 35
between the see-ers and the objects. Because flesh is tactile as well as visible, it can touch as well as be touched and can see as well as be seen. There is flux in the reciprocal network that is body and world, involving touching, vision, seeing, and things tangible. Let’s take another example. Imagine that your right hand touches your left hand while it is palpating something. At this moment of touching, the subjective world of touching transforms into the objective world of being touched. Merleau-Ponty wrote that, in this sense, “the touching subject passes over to the rank of the touched, descends into the things, such that the touch is formed in the midst of the world and as it were in the things” (Merleau-Ponty, 1968, pp.133–134). Although the subject of touching and the object of being touched are opposite in meaning, they are rendered identical when Merleau-Ponty’s concept ofchiasm is applied. Chiasm, srcinating from the Greek letter χ (chi), is a rhetorical method to locate words by crossing over, combining subjective experience and objective existence. Although the concept might become a little difficult from here onward, let’s imagine a situation in which a person who has language to describe only two-dimensional objects happens to encounter a novel object, a column, as a three-dimensional object, as Tohru Tani (1998) suggests. By exploring the object from different viewpoints such as from the top or side, he would say that this circular column could be a rectangular one and this rectangular column could be a circular one (Figure 3.3). When this is written in the form of chiasm, it is expressed as: [This circle is a rectangle.]
X
[T his rectangle is a circle.]
Thus, the two- dimensional world is extended to a three- dimensional one in which a circle and a rectangle turn out to be just dif ferent views of the column. The conflict between the two is resolved by means of creating an additional dimension that supp orts their identity in a deeper level. Let’s consider then what could be created, or emerge, in the following cases. [A subject of touching is an object of being touched.] of being touched is a subject of touching.]
X
[An object
[A see-er is a visible object.] X [A visible object is a see-er.] Merleau-Ponty suggests that embodiment as an additional dimension emerges, in which flesh of the same tangibility as well as the same thickness can be given to both the subject of touching or seeing and the object of being touched or being seen. This dimension of embodiment
36
On the Mind
Figure 3.3. A person who has language to describe only two- dimensional objects happens to encounter a novel object, a column, as a threedimensional object.
can facilitate the space for iterative exchanges between the two poles of subject and object : There is a circle of the touched and the touching, the touched takes hold of the touching; there is a circle of the visible and the seeing, the seeing is not without visible existence. … My body as a visible thing is contained within t he full s pectacle. But my seeing body subtends this visible body, and all the visibles with it. There is reciprocal inser tion and intertwi ning of one in the other (MerleauPonty, 1968, p.143). Merleau-Ponty, in exploring the two poles of subjectivity and objectivity, did ambiguity not anchorbetween his thoughts in the midst of these two extremes, but rather allowed them to move dynamically between the two. By positioning the poles to face each other, he would have imagined a flux, a flow, from one pole to the other and an intertwining of the two in the course of resolving the apparent conflicts between them in the medium of embodiment. When the flux intertwines the two, the subject and the object become an inseparable being reciprocally inserted into each other with the world arising in the gap.
Phenomenology 37
Recently, the thoughts on embodiment have been revived and have provided significant influences in cognitive science in terms of the rising “embodied minds” paradigm in the philosophy of mind and cognitive sciences (Varela, Thompson & Rosch, 1991; Clark, 1998; Ritter et al., 2000; O’Regan & Noë, 2001). Actually, a new movement, referred as the behavior-based approach (Brooks, 1990) in artificial intelligence and robotics started under this trend, as is repeatedly encountered in later chapters. Let’s move on now to examination of the concept of the stream of consciousness put forward by the pioneering American psychologist and philosopher William James (1892) more than a half century before Merleau-Ponty. As we go, we’ll find some connection between James’ thinking and Husserl’s concept of time perception, especially that of the level of absolute flow. Also, we’ll see a certa in af finity between thi s notion and that of Merleau-Ponty’s in his attempt to show the immanent dynamics of our inner phenomena. By examining James’ stream of consciousness, we can move closer toward answering how our will might be free.
3.6. Stream of Consciousness and Free Will We experience our conscious states of mind as thoughts, images, feelings, and desires that flow while they constantly change. James defines his notion of the stream of consciousness as the inner coherence or unity of conscious states as they proceed from here to the next. He explains the four essential characteristics of this stream in his monumental Principles of Psychology (1918, p. 225) as follows: 1. Every “state” tends to be part of a personal consciousness. 2. Within each personal consciousness states are always changing. 3. Each personal consciousness is sensibly continuous. 4. It is interested in some parts of its object to the exclusion of others, and welcomes or rejects— chooses from among them, in a word— all the while. The first characteristic means that the various states comprising the stream are ultimately subjective matters that the subjects feel they
38
On the Mind
experience by themselves. In other words, the subjects can keep them private in their states of mind. The second characteristic, one of the most important of James’ claims, asserts that although the stream preserves the inner coherenc e as one stream, its states are constantly changing autonomously as various thoughts and images are generated. James writes that: When we take a general view of the wonderful stream of our consciousness, what strikes is the pace of its parts. Like a bird’s life, it seems to be an alternation of flights and perchings (James, 1918, p. 243). James considers that the stream comprises successions of substantive parts of stable “perchings” and transitive “flights.” Conscious states of thoughts and images appear more stably in the substantive parts. On the other hand, the transitive parts generate successive transitions from one substantive part to another in temporal association. This alternation between the two parts takes place only intermittently, and the duration of each substantive part can be quite different but only in terms of subjective feeling of time. Here, we can find a structural similarity to what Tohru Tani (1998) interpreted as consciousness flowing as well as stagnating when referring to Husserl’s flow of absolute consciousness. Although it is said that the transitive parts function to connect and relate various thoughts and images, how are they actually felt phenomenally? James describes them as a subtle feeling like when, immediately after hearing someone say something, a relevant image is about to pop into the mind but is not yet quite fully formed. Because the thoughts and images are so faint, they are lost if we attempt to catch them. The transitive parts are the fringes of stable images and relate to each other, where information flows like the free water of consciousness around these images. James considers these fringes to be more essential than stable images and that the actual stream of consciousness is generated by means of tensional dynamics between stable images related to each other by their fri nges. The thi rd observation suggests that the private states of consciousness constantly change but only continuously so. James says that consciousness is not like chopped up bits or jointed segments but rather flows like a river. This statement appears to conflict with the concept of time perception at the objective time level put forward by Husserl, because he considered that objective time comprises sequences of discrete objects
Phenomenology 39
and events. However, James’ idea is analogous to the absolute flow level, as mentioned before. I suspect that James limited his observation of the stream of consciousness to the level of pure experience and did not proceed to observation of the higher level such as Husserl’s objective time. We can consider then that the notion of the stream of consciousness evolved from James’ notion of present existence characterized by continuous flow to Husserl’s notion of recall or reconstruction with trains of segmented objects. Alongside this discussion, from the notion of the sensible continuity of the stream of consciousness we can see another essential consequence of James’ thought, that the continuous generation of the next state of mind from the current one endows a feeling that each state in the stream belongs to a single enduring self. The experience of selfhood —the feeling of myself from the past to the present as belonging to t he same self— might arise from the sensible continuity of the conscious state. Finally, the fourth observation professes that our consciousness attends to a particular part of experiences in the stream. Or, that consciousness brings forth some part of a whole as its object of attention. Heidegger (1962) attends to this under the heading of “attunement,” and James’ observations of this aspect of the stream of consciousness lead to his conception of free will . Free will is the capability of an agent to choose freely, by itself, a course of action from among multiple alternatives. However, the essential question concerning free will is that if we suppose that every thing proceeds determi nistically by following the laws of physics, what is left that enables our will to be free? According to Thomas Hobbes, a materialist philosopher, “voluntary” actions are compatible with strict logical and physical determinism, wherein “the cause of the will is not the will itself, but something else which is not disposed of it” (Molesworth, 1841, p. 376). He considers that will is not in fact free at all because voluntary actions, rather than being random and uncaused, have necessary causes. James proposed a possible model for free will that combines randomness and deterministic characteristics, in the so-called two-stage model (James, 1884). In this model, multiple alternative possibilities are imagined with the help of some degree of randomness in the first stage and then one possibility is chosen to be enacted through deterministic evaluation of the alternatives in the second stage. Then, how can these possible alternatives, in terms of the course of actions or images, be generated? James considers that all possibilities are learned by way of experience. He
40
On the Mind
says, “when a particular movement, having once occurred in a random, reflex, or involuntary way, has left an image of itself in the memory, then the movement can be desired again, proposed as an end, and deliberately willed” (James,1918, p. 487). He considers further that the iterative experiences of different movements result in connections and relations among various images of movements in memory. Then, multiple alternatives can be imagined as accidental generations with spontaneous variations from the memory that has been consolidated, and finally one of the alternatives is selected for actual enactment. These “accidental generations with spontaneous variations” might be better understood by recalling how James’ stream of consciousness is constituted. The stream is generated by transitions of thoughts and images embedded in the substantial part. When the memory holds complex relations or connections between images of past experiences, images can be regenerated with spontaneous variations into streams of consciousness (see Figure 3.4 for an illustration of his ideas). James considers that all of these thi ngs are mechanized by dynamics in the brain. He writes: Consider once again the analogy of the brain. We believe the brain to be an organ whose internal equilibrium is always in a state of change—the change affecting every par t. The pulses of change are
Stable image
Experiences
Learning
Memory
Transitive relation
Multiplestreams of
Actual action
an image generated with spontaneous variations
selected
Figure 3.4. An interpretative illustration of James’s thought accounting for how possible alternatives can be generated. Learning from various experiences forms a memory that has a relational struct ure among substantial images as sociated with actions. Multiple streams of act ion images can be generated with spontaneous variations of transitions f rom among the images embedded in the memory. One of those streams of action images is selected for actual generation.
Phenomenology 41
doubtless more violent in one place than in another, their rhythm more rapid at this time than at that. As in a kaleidoscope revolving at a uniform rate, although the fig ures are always rearranging themselves, … So in the brain the perpetua l rearrangement must result in some forms of tension lingering relatively long, whilst others simply come and pass (James, 1892). It is amazing that view moreofthan 100 years ago James already had developed such a dynamic brain processes. His think ing is compatible with today’s cutting- edge views outlined in studies on neurodynamic modeling, as seen i n later chapters.
3.7. Summary Skeptics about the symbolist framework for representing the world as put forward by traditional cognitive science and outlined in chapter 2 has led us in the present chapter to look to phenomenology for alternative views. Let’s take stock of what we’ve covered. Phenomenology begins with an analysis of direct experiences that are not yet articulated by any ideas or thoughts. Husserl considers that objects and the world can exist because they can be meditated on, regardless of their corresponding existences in the physical world. Their representations are constituted by means of the intentionality of direct experiences, a process that entails consciousness. Although Husserl thinks that such representations are intended to be idealistic so as to be logically tractable, his thinking has been heavily criticized by Dreyfus and other modern philosophers. They claim that the inclination to ideality with logical formalism has turned out to provide a foundation for the symbolist framework envisioned by current cognitive science. It was Heidegger who dramatically redirected phenomenology by returning to the problem of being. Focusing on the ways of being in everyday life, Heidegger explains through his notion of being- in-theworld that things can exist on account of the relational structu re between them, for example, considering our usage of things. His thinking lies behind my early question, discussed in the i ntroduction to th is book, as to what the object of a refrigerator can actually mean to a robot when it names it a “refrigerator.” The refrigerator should be judged not from its characteristic physical features but from the ways in which it is used,
42
On the Mind
such as for taking a chilled beer from it. Heidegger also says that such being is not noticed particularly in daily life as we are submerged in relational structures, as usage becomes habit and habit proceeds smoothly. We become consciously aware of the individual being of the subject and the object only in the very moment of the breakdown in the purposeful relations between them; for example, when a carpenter mishits a nail in hammering, he notices that himself, the hammer, and the nail are independent beings. In a similar way, when habits and conventions break down, no longer delivering anticipated success, the authentic individual engages in serious reflection of these past habits, transforms them, and thus lives proactively for his or her “own most” future alongside and with others with whom these habits and conventions are shared. Merleau-Ponty, who was influenced by Heidegger, examined bodies as ambiguous beings that are neither subject nor object. On MerleauPonty’s account, when seeing is regarded as being seen and touching as being touched, these different modalities of sensation intertwine and their reentrance through embodiment is iterated. By means of such iterative processes, the subject and the object constitute an inseparable being, reciprocally inserted into each other in the course of resolving the apparent conflicts between them in the medium of embodiment. Recently, his thoughts on embodiment have been revived and have provided significant influences in cognitive science in terms of the rising “embodied minds” paradigm, such as by Varela and his colleagues (Varela, Thompson & Rosch, 1991). We finished this chapter by reviewing how William James explained the inner phenomena of consciousness and free will. His dynamic stream of conscious is generated by spontaneous variations of images from past experiences consolidated in memory. More than a century later, his ideas are still inspiring work in systems neuroscience. By the way, do these thoughts deliberated by those philosophers suggest anything usef ul for building mi nds, though? Indeed, at the least we should keep in mind that action and perception interact in a complicated manner and that our minds should emerge via such nontrivial dynamic process. The next chapter examines neuroscience approaches for exploring the underlying mechanisms of the cognitive minds in biological brains.
4 Introducing the Brain and Brain Science
In the previous chapter, we saw that a phenomenological understanding of the mind has come from introspection and its expression through language. We understand the words used intuitively or deliberatively by matching them with our own experiences and images. This approach to understanding the mind, that of subjective reflection, is clearly an essential approach, and is especially valuable when coupled with the vast knowledge that has been accumulated through other scientific approaches, such as neuroscience, which make use of modern technologies to help us understand how we think by understanding how the brain works. The approach that neuroscience, or brain science, takes is quite different from that of cognitive science and phenomenology because it rests on objective observation of biological phenomena in the brain. It attempts to explain biological mechanisms for various cognitive functions such as generating actions, recognizing visual objects, or recognizing and generating speech. However, readers should note that brain science is still in a relatively early stage of development and we have no confirmative accounts even for basic mechanisms. What we do have is some evidence of what is happening in the brain, a lbeit in many cases the evidenc e is still conflicting.
43
44
On the Mind
What we have to do is build up the most likely construct for a theory of the brain by carefully examining and linking together all the pieces of evidence we have thus far accumulated, held against guiding insights into the phenomenology of the human condition, such as those left by James and Merleau-Ponty, while adding yet more experimental evidence in the confir mation or disputation of these gu iding insights. In t he process, further guiding insights may be generated, and research into the nature of the mind relative to the function of the brain will advance. The next section starts with a review of the current state of the art in brain science with a focus on the processes of visual recognition and action generation, essential for creating autonomous robots. First, the chapter provides a conventional explanation of each independently, and then covers recent views that argue that these two processes are effectively inseparable. At the end of this chapter, we introduce some ideas informed by our robotics experiments on how intentions for actions srcinate in (human and other animal, organic not arti ficial) brains.
4. 1. Action Hierarchical Brain Mechanisms for Visual Recognition and Generation This section explores how visual recognition and action generation can be achieved in brains by reviewing accumulated evidence. A special focus will be put on how those processes work with hierarchical organization in brains, beca use insights into thi s structure help to guide us in approaching outstanding questions in cognitive science, such as how compositional manipulations of sensory-motor patterns can be achieved, as well as how the direct experience of sensory– motor flow can be objectified. 4.1.1 Visual Recognition Through Hierarchy and Modularity First, let us look at the visual recognition proc ess. Visual recognition is probably the most examined brain function, because related neuronal processes can be investigated relatively easily in electrophysiological experiments with nonmoving, anesthetized animals. The visual stimulus enters the retina first, proceeds to the lateral geniculate nucleus in the thalamus, and then continues on to the primary visual cortex (V1). One important characteristic assumed in the visual cortex as
Introducing the Brain and Bra in Science
LIP
VIP
w
MST/MT V2
he re
V
V4
45
LIP: lateral intraparietal area VIP: ventral intraparietal area MST: medial superior temporal area MT: middle temporal area TEO, TE: inferior temporal areas
TEO TE
t wha
Figure 4.1. Visual cortex of the macaque monkey showing the “what” and “where” pathways schematically.
well as in other sensory cortices is its hierarchical and modular processing, which uses specific neuronal connectivity between local regions. Figure 4.1 shows the visual cortex of a macaque monkey in which the visual stimulus from the retina t hrough the thalamus enters V1 located in the posterior part of the cortex. V1 is thought to be responsible for lower end processing such as edge detection by using so- called columnar organization. The cortical columns for edge detection in V1 are arrayed for continuously changing orientation. The orientation of the perceived edge in the local receptive field is detected in a winner- take-all manner; t hat is, only the best matching column for the edge orientation is activated (i.e., neurons in the column get fired) and other columns become silent. After V1, the signal propagates to V2 where columns undertake slightly more complex tasks such as perceiving different orientations of line segments by detecting the end terminals of the line segments. Afte r V2, the visual processing pathway branches into two: The ventral pathway reaches areas TEO and TE in the inferotemporal cortex, passing through V4, and the dorsal pathway reaches areas LIP and VIP in the parietal cortex, passing through the middle temporal area (MT) and medial superior tempo ral area ( MST). T he ventral branch is called the what pathway owing to its main involvement in object identification and the latter is called the where pathway due to its involvement in information processing related to position and movement. Taking the case of the where pathway first, it is said that the MT detects direction of object motion with a relatively small receptive field, whereas the MST detects background scenes with a larger receptive field. Because movements in the background scene are related to own body movements in many cases, the MST consequently detects
46
On the Mind
Figure 4.2. Cell responses to comp lex object features in area TE in the inferotemporal cortex. Columnar modular representation in the T E for complex visual objects. Redrawn from (Tanaka, 1993).
self-movements. T his information is then sent to areas such as the V IP and LIP in the parietal cortex. Cells in the VIP are multisensory neurons that often respond to both a visual stimulus and somatosensory stimulus. For example, it has been found that some VIP neurons in macaque monkeys respond when the experimenter strokes the animal’s face, and the same neurons fire when the experimenter shakes the monkey’s hand in front of its face. As discussed later, many neurons in the parietal cortex integrate visual inputs with another modality of sensation (i.e., somatosensory, proprioception, or auditory). LIP neurons are involved in processing saccadic eye movements, enabling the visual localization of objects. In the case of the what pathway, cells in V4 respond to specific contours or simple object features. Cells in the TEO respond to both simple and complex object features, and cells in the TE respond only to complex object features. In terms of the visual processing that occurs in t he inferotemp oral cortex, ins piring observations were made by Keiji Tanaka (1993) when conducting si ngle- unit recording 1 in partially anesthetised monkeys while showing the animals a set of artificially created complex object features. Columnar repres entations were found in the TE for a set of complex object features, wherein most of the cells in the same column reacted to simila r complex object features (Figure 4.2).
1. Single- unit recording is a method of measuring the electro- physiological responses of a single neuron using a microelectrode system.
Introducing the Brain and Bra in Science
47
TE
V4
V2
V
Figure 4.3. Schematic illustration of visual perception in the what pathway.
For example, in a particular column that encodes starlike shapes, different cells may react to similar starlike shapes that have a different number of spines. This observation suggests that TE columns represent a set of complex object features discretely like visual alphabets , but allow a range of modulation of complex object feature within the column. It can be summarized then that visual perception of objects might be compositional in the what pathway, in the sense that a set of visual parts registered in a previous level of the hierarchy are spatially combined in its nex t level, as illustrated in Figure 4.3. In the first stage in V1, edges are detected at each narrow local receptive field from the raw retinotopic image, and in V2 the edge segments are detected. In V4 with its larger receptive field, connected edge segments for continuously changing orientations are detected as a single contour curvature. Then in the TEO, geometric combinations of contour curvatures in a larger again receptive field are detected as a simple object features (some could be complex object features). Finally, in the TE, combinations of the object feature are detected as a complex object feature. It seems that columns in each visual cortical area represent primitive features at each stage of visual processing. Furthermore, each primitive feature represented in a column might be parameterized for minor modulation by local cell fir ing patterns. 4.1.2 Counter Arguments As mentioned in the beginning of this section, we must exercise some caution in i nterpreting actual brain mechanisms from the data available
48
On the Mind
to us thus far. Although the aforementioned compositional mechanisms for visual recognition were consid ered util izing explicit representations of the visual part s stored in t he local columns and hierarchical manipu lation of those from the lower level to the higher, the real mechanism may not be so simply mechanical but also highly contextual. There is accumulating evidence that neuronal response in the local receptive field in early vision can be modulated contextually by means of lateral interactions with areas outside of the receptive field as well as through top- down feedback from higher levels. Although contours are thought to be perceivable only after V4 in the classical theory, Li and colleagues (2006) showed that, in monkeys performing a contour detection task, there was a close correlation between the responses of V1 neurons and the perceptual saliency of contours. Interestingly, they showed that the same visual contours elicited significantly weaker neuronal responses when they were not the objects of attention. They concluded that contours can be perceived even in V1 by using the contextual information available at this same level and the higher level. Kourtzi and colleagues (2003) provided corroborative evidence that early visual areas V1 and V2 respond to global rather than simple local features. It was argued that context modulation in the early visual cortex has a highly sophisticated nature, in effect putting the local features to which the cells respond into their full perceptual global context. These experimental results were obtainable because of the use of awake animals rather than anestheti zed ones during the recording . In the electrophysiological experiments of the visual cortex, animals are usually anesthetized so as to avoid contamination of purely bottom-up perceptual signals with unnecessary top- down signals from the higher order cognitive brain regions such as the prefrontal cortex. Contrary to this method, however, top-down signals seem to be equally as important as the bottom- up ones in understanding the hierarchy of vision. Rajesh Rao and Dona Ballard (1999) proposed so-called predictive coding as a model for hierarchical visual processing in which the top- down signal conveys prediction from the higher level activity to the lower one, whereas the bottom-up signal conveys the prediction error signal from the lower level, which modulates the higher level activity. They argue that the visual recognition of complex objects is achieved via such interaction between these two pathways rather than merely through the bottom-up one. This insight is deeply important to the neurorobotic experiments to come.
Introducing the Brain and Bra in Science
49
The modularity of feature representation in the columnar organization is also questionable. Yen and colleagues (2007) made simultaneous recordings of multiple early visual cortex cells in cats while showing the animals movies containing scenes from daily life. What they found was that there is a substantially large heterogeneity in the responses of adjacent cells in the same columns. This finding obviously conflicts wit h the classical view that cells with similar response properties are clustered together in columns. They mention that visual cortex cells could have multiple response dimensions. To sum up, the presumption of strict hierarchical and modular processing in visual recognition might have to be reconsidered given accumulated evidence obtained as experimental setups become more realistic. The next s ubsection begins this process concerning action generation in the brain. 4.1.3 Action Generation Through Hierarchy Understanding the brain mechanisms behi nd action generation is esse ntial to our attempts at understanding how the mind works because actions tie the subjective mind to the objective world . It is generally thought that complex actions can be generated by moving through multiple stages of processing in dif ferent local areas in t he brain in a similar way to how visual perception is achieved. Figure 4.4 shows the main brain areas assumed to be involved in action generation in the cortex. The supplementary motor area (SMA) and the premotor cortex (PMC) are considered to sit at the top of the action generation hierarchy. Some researchers think that the prefrontal cortex may play a further higher functional role in action generation, sitting as it does above the SMA or PMC, and we will return to this view later. It is generally held that the SMA is involved in organizing action programs for voluntary action sequences, whereas the PMC is involved in organizing action programs for sensory guided action sequences. Because these areas have dense projections to the primary motor cortex (M1), the idea is that detailed motor patterns along with the motor program are generated in M1. Then, M1 sends the motor pattern signals via the pons and cerebellum to the spinal cord, which then sends out detailed motor commands to the corresponding muscles to finally initiate physical movement. As a seminal study for the primary motor cortex, Georgopoulos and colleagues (1982) found evidence in electrophysiological experiments in
50
On the Mind PMC
M
SMA
Parietal cortex
Prefrontal cortex
Inferior parietal cortex
Figure 4.4. The main cortical areas involved in action generation include the primary motor cortex (M1), supplementary motor area (SMA), premotor cortex (PMC), and parietal cortex. T he prefrontal cortex and inferior parietal cortex a lso play important roles.
monkeys that the direction of hand movement or reaching behavior is encoded by a population of neural activities in M1. In the following, we review possible relationships between SMA and M1 and between PMC and M1. 4.1.4 Voluntary Sequential Movements in the Supplementary Motor Area Considerable evidence suggests t hat hierarchical relations exist between the SMA and M1. One well-known example involves patients with alien hand syndrome due to lesions in the SMA. These patients tend to generate actions completely bypassing their consciousness. For example, when they see a comb, their hand reaches out to it and they comb their hair compulsively. It is essential to note that skilled behaviors involved in combing their hair are completely intact. The people act well, it is just that they seem unable to regulate their actions at will. By way of explanation, it is thought that the SMA might regulate the generation of skilled behaviors by placing inhibitory controls over M1, which encodes a set of basic movement patterns including the one for combing hair. So if thi s inhibitory control is attenuated by lesi ons in the SM A, the mere
Introducing the Brain and Bra in Science
51
perception of a comb could automatically trigger the movement pattern for the combing of hair stored in M1. Neurophysiological evidence for the encoding of voluntary sequential movement in the SMA was obtained in pioneering studies conducted by Tanji’s group (Tanji & Shima, 1994; Shima & Tanji, 1998; Shima & Tanji, 2000). In these studies, monkeys were trained to be able to regenerate a set of specific sequential movements involving a combination of three primitive movements— pulling, pushing, and turning a handle. In each sequence, the three primitive movements were connected in serial order with a specific time interval at each t ransition of movement. After the training, the monkeys were required to regenerate each learned sequential movement from memory without any sensory cues being given. In this way, the task can be regarded as memory driven rather than sensory reactive. In the unit recording in t he SMA during t he regeneration phase, three types of t ask-related cells were found. The first interesting finding was that 54 out of 206 recorded cells showed sequence-specific activities. Figure 4.5 a shows raster plots of one of these 54 cells, in this case an SMA cell, which was activated only before the sequence Turn-Pull-Push (lower) was initiated, not before other sequences such as Turn-Push-Pull (upper) were initiated. It is interesting to note that it took a few seconds for the SMA cell to be fully activated before onset of the sequential movements and that the activation was diminished immediately after onset of the first movement. It is assumed, therefore, that the cell is responsible for preparing the action program for the specific sequential movement. This is contrasted with the situation observed in the M1 cell shown in the raster plot in Figure 4.5 b. The M1 cell started to become active immediately before the onset of the specific movement and became fully activated during the actual movement itself. The preparatory period of this M1 cell was quite short, within a fraction of a second. Tanji and Shima’s results imply that some portion of SMA cells play an essential role in the generation of compositional actions by sequentially combining primitive movements. These cells might encode whole sequences as abstract action programs with slowly changing activation profiles during the preparatory period. Thi s activity might t hen lead to the activation of other SMA cells that can induce specific transitions from one movement to another during run time by activating particular M1 cells, as well as SMA cells that encode corresponding movements with rapidly changing activation profiles. Here, we can assume
52
On the Mind (a)
SMA Turn
Push
Pull
Raster plots
M Pull
Push
sec.
s
[SEQ4] Turn
Mean firing sec. Raster plots
Turn
Pull
Push
Mean firing
(b) sec.
s
Figure 4.5. Raster plots of sho wing cell firing in multiple trials in the upper part and the mean fir ing rate across the multiple trials in the supplementary motor area (SMA) and primary motor cortex (M1) during trained sequential movements. (a) An SMA cell activated only in the preparatory period for initiating the Turn- Pull-Push sequence shown in the bottom panel, not for other sequences such as the Turn-Push-Pull sequence shown in the top panel. (b) An M1 cell encoding the single Push movement. Adopted from Tanji and Shima (1994) with permission.
a certain spatiotemporal structure that affords hierarchical organization of sequential movements. In later work, Shima and Tanji (2000) reported further important findings from more detailed recording in a similar task protocol. Some cells were found to play multiple functional roles: Some SMA cells encoded not only a single specific motor sequence, but two or three different sequences out of four trained sequences. This suggests an interesting neuroscientific result that a set of primitive sequences is represented by distributed activation of some SMA cells rather than each sequence being represented by some specific cells exclusively and uniquely. Although evidence acquired by various brain measuring techniques supports the notion that hierarchical organization of voluntary sequential movements occurs in the SMA for abstract sequence processing and in M1 for detailed movement patterns, this view is not yet set in stone. Valuable challenges have arisen against the idea of the SMA encoding abstract sequences. Lu and Ashe (2005) recorded M1 cell activity
Introducing the Brain and Bra in Science
53
during sequential arm movements in monkeys. In the task, each arm movement was either downward, upward, toward the left, or toward the right. It was found that the neural activity of some M1 cells immediately before onset of the sequential movements “anticipated” the coming sequences, and that 40% of the recorded M1 cells could do this. Surprisingly, this percentage is much higher than that observed in the SMA by Tanji and Shima. Are the sequence-related activities of M1 cells merely epiphenomena that reflect the activity of SMA cells upstream or do they actually function to initiate corresponding motor sequences? Lu and Ashe dispelled any doubt about the answer by demonstrating that a lesion among the M1 cells, artificially created by microinjection of chemicals, degraded only the generation of sequences not each movement. It seems then that M1 cells primarily encode sequences rather than each movement, at least in the monkeys and cells involved in Lu and Ashe’s experiment. 4.1.5 Sensory-Guided Actions in the Premotor Cortex The SMA is considered by most to be responsible for organizing complex actions such as sequential movements based on internal motivation, whereas the PMC is considered to generate actions in a more externally driven manner by making use of immediate sensory information. Mushiake in Tanji’s group showed clear neurophysiological evidence for this dissociation (Mushiake et al., 1991). They trained monkeys to generate sequential movem ents under t wo dif ferent conditions: the inter nal motivation condition in which the monkeys remembered sequential movements and reproduced them f rom memory, and the external sensory driven condition in which the monkeys generated sequential movements guided by given visual cues. Unit recording in both the SMA and PMC during these two task conditions revealed a distinct difference in the functional roles of these two regions. During both the premovement and movement periods, PMC neurons were more active when the task was v isually g uided and SMA neurons were more active when the sequence was self- determined from memorized sequential movements. It is known that there are so - called bimodal neurons in the PMC that respond to both specific visua l stimuli and to one’s own movement patterns. These bimodal neurons in the PMC associated with vi sual movement are sa id to receive “what” information from the inferotemporal cortex and “where” information from the parietal
54
On the Mind
cortex. Thus, these bimodal neurons seem to enable the PMC to organize sensory- guided complex actions. Graziano and colleagues (2002) in their local stimulation experiments on the monkey cortex demonstrated related findings. However, in some aspects, their experimental results conflict with the conventional ideas that M1 encodes simple m otor pattern s such as direct ional movements or reaching actions as shown by Georgopoulos and colleagues. They st imulated moto r- related cortical regions with an electric current and recorded the corresponding movement trajectories of the limbs. Some stimuli generated movements involved in reaching to specific parts of the monkey’s own body including the ipsilateral arm, mouth, and chest, whereas others generated movements involving reaching toward external spaces. They found some topologically preserved mapping from sites over a large area including M1 and PMC to the generated reaching postures. The hand reached toward the lower space when the dorsal site s in the region were sti mulated, for example, but reached toward the upper space wh en the ventra l and anterior site s were stimulated. It was also found that many of those neurons were bimodal neurons exhibiting responses also to sensor y sti mulus. Giv en these results, Graziano and colleagues have adopted a different view from the conventional one in that they believe that functional specification is topologically parameterized as a large single map, rather than there being separate subdivisions such as M1, the PMC, and the SMA that are responsible for differentiable aspects of motor- related functions in a more piecemeal fashion. So far, some textbookish evidence has been introduced to account for the hierarchical organization of motor generation, whereby M1 seems to encode primitive movements, and the SMA and PMC are together responsible for the more macroscopic manipulation of these primitives. At the same time, some counter evidence was introduced that M1 cells function to sequence primitives as if no explicit differences might exist between M1 and the PMC. Some evidence was also presented indicating that many neurons in the motor cortices are actually bimodal neurons that participate not only in motor action generation but also in sensory perception. The next section explores an alternative view accounting for action generation mechanisms, which has recently emerged from observation of bimodal neurons that seem to integrate these two processes of action generation and recognition.
Introducing the Brain and Bra in Science
55
4.2. A New Understanding of Action Generation and Recognition in the Brain This book has alluded a number of times to the fact that perception of sensory inputs and generation of motor outputs might best be regarded as two sides of the same coin. In one way, we may think that a motor behavior is generated in response to a particular sensory input. However, in the case of voluntary action, intended behaviors performed by bodies acting on environments necessarily result in changes in proprioception, tactile, visual, and auditory perceptions. Putting two together, a subject should be able to anticipate the perceptual outcomes for his or her own intended actions if similar act ions are repeated under similar conditions. Indeed, the developmental psychologists Eleanor Gibson and Anne Pick have emphasized the role of perception in action generation. They once wrote in their sem inal book (200 0) that infants are active learners who perceptually engage their environments and extract information from them. In their ecological approach, learning an action is not just about learning a motor command sequence. Rather, it involves learning possible perceptual structures e xtracted during intentional interactio ns with the environment. Indeed, actions m ight be represented in terms of an expectation of the resultant perceptual sequences caused by those intended actions. For example, when I reach for my mug of coffee, it might be represented by a particular sequence of proprioception for my hand to make the preshape for grasping, as well as a particular sequence of visual perception of my hand approaching the mug with a specific expectation related to the moment of touching it. Eminent neuroscientist Walter Freeman (200 0) argues that action generation can be regarded as a proactive process by supposing this sort of action– perception cycle, rather than as the more passive, conventional perception–action cycle whereby motor behaviors generated response to perception. Upon keeping are minds of theseinarguments, this chapter star ts by examining the functional roles of the parietal cortex, as this area appears to be the exact place where the top-down perceptual image for action intention srcinating in the frontal area meets the perceptual reality srcinating bottom- up from the various peripheral sensory areas. T hus located, the parietal cortex may play an essential role in mediating between the two, top and bottom. It then examines in detail so- called mirror neurons that are thought to be essential to pair generation and
56
On the Mind
to perceptual recognition of actions. It is said that the finding of mirror neurons drastically changed our understanding of t he brain mechanisms related to action generation and recognition. Finally, the chapter rounds out by looking at neural correlates for intentions, or “will,” that are thought to be initiated farthest upstream in the actional brain networks, by examining some evidence from neuroscience that bear on the nature of free will. 4.2.1 The Parietal Cortex: Where Action Intention and Perceptual Outcome Meet The previous section (4.1) discussed the what and where pathways in visual processes. Today, many researchers refer to the where pathway that stretches from V1 to the parietal cortex as the how pathway because recent evidence suggests that it is related more to behavior generation that makes use of multimodal sensory information than merely to spatial visual perception. Mel Goodale, David Milner, and colleagues (1991) conducted a series of investigations on patient D. F. who had visual agnosia, a severe disorder of visual recognition. When she was asked to name some household items, she misnamed them, calling a cup an ashtray or a fork a knife. However, when she was asked to pick up a pen from the table, she could do it smoothly. In this sense then, the case of D. F. is very similar to that of Merleau-Ponty’s patient Schneider (see chapter 3). Goodale and Milner tested D. F.’s ability to perceive the three- dimensional orientation of objects. Later, D. F. was found to have bilateral lesions in the ventral what pathway, but not in the dorsal how pathway, in the parietal cortex. This implies that D. F. could not recognize three- dimensional objects visually using information about their category, size, and orientation because her ventral what pathway including the inferotemporal cortex was damaged. She could, however, generate visually guided behaviors without conscious perception of objects. This was possible because her dorsal pathway including the parietal cortex was intact. Thus, the parietal cortex appears to be involved in how to manipulate visual objects, by allowing a close interaction between motor components and sensory components. That the parietal cortex involves the generation of skilled behaviors by integrating vision- related and motor-related processes is a notion supported by the findings of electrophysiological experiments, especially those concerning bimodal neurons in the parietal cortex of the
Introducing the Brain and Bra in Science
57
monkey during visually guided object manipulation. Hideo Sakata and colleagues (1995) identified populations of neurons that fire both when pushing a switch and when visually fixati ng on it. Skil led object manipulation behaviors such as pushing a switch should require an association between the visual information about the object itself and the motor outputs required for acting on it and so, by extension, some populations of parietal cortex neurons should participate in thi s association by accessing both modalities of information. Damage to the parietal cortex in humans, such as that caused by cerebral hemorrhage dueto stroke or trauma, for instance, can result in various deficits in skilled behavior needed for tool use. In the disorder ideational apraxia, individuals cannot understand how to use tools: If they are given a comb, they might try to brush their teeth with it. In ideomotor apraxia, individuals have difficulty particularly with miming: When asked to mime using a knife, they might knock on the table with their fist or when asked to mime picking up tiny grains of rice, they move their hand toward the imagined grains but with it wide open. These clinical observations suggest that the parietal cortex might store some forms of knowledge, or “models,” about the external world (e.g., objects, tools, and surrounding workspace), and through these models various mental images about possible interactions with the external world can be composed. How can the skills or knowledge for object manipulation, or tool usage, be mechanized i n the parietal corte x? Such skills would seem to require not only motor pattern generation but al so proactive representation of the perceptual image associated wit h the motor act. Although the parietal cortex is conventionally seen as being responsible for integrating input from multiple sensory modalities, an increasing number of recent studies suggest that the parietal cortex might participate in predicting perceptual inputs associated with behaviors by acquiring some type of internal model (Sirigu et al., 1996; Eskandar & Assad, 1999; Desmurget & Grafton, 2000; Ehrsson et al., 2003; Mulliken et al., 2008; Bor & Seth, 2012). In particular, Mulliken and colleagues (200 8) found direct evidence for the existence of the predictive model in the parietal cortex in their unit recording experiment involving monkeys performing a joystick task to control a cursor. They found that specific cells in the parietal cortex encode temporal estimates of the direction in which the cursor is moving, estimates that cannot be obtained directly from either of the current sensory inputs or motor outputs to the joystick, but can be obtai ned by forward prediction.
58
On the Mind
Now, let’s consider how predicting perceptual sequences could facilitate the generation of skilled actions in the parietal cortex. Some researcher s have considered that a predict ive model referred to as the for ward model and assumed to operate in t he cerebellum might also help us to underst and wha t is happening in t he parietal cortex. Masao Ito, who is famed for his findings linking long- term depression to the cerebellum, suggested that the cerebellum might host internal models for action (Ito, 1970). Following Ito’s idea, Mitsuo Kawato and Daniel Wolpert constructed detailed forward models— computational models— that account for optimal control of arm movements (Kawato, 1990; Wolpert & Kawato, 1998). The forward model basically predicts how the current sensory inputs change in the next time step for arbitrary motor commands given in the current time step. In the case of arm movement control, the forward model predicts changes in the angular positions of the arm joints as output when given joint motor torques as input. Adequate training of the forward model based on iterative past experience of how joint angles change due to particular applied motor torques can generate a good predictive model. More recently, Ito (2005) suggested that the forward model might be first acquired in the parietal cortex and the model further consolidated in the cerebellum later. In addition, Oztop, Kawato, and Arbib (2006) as well as Blakemore and Sirigu (2003) have suggested that both the parietal cortex and cerebellum might host the forward model. I, however, speculate that the predictive model in the parietal cortex may predict the perceptual outcome sequence as corresponding not to motor commands at each moment but to macroscopic states of “intention” for actions that might be sent from the higher-order cognition processing area such as the prefrontal cortex (Figure 4.6). For example, for a given intention of “throwing a basketball into a goal net,” the corresponding visuo-proprioceptive flow consisting of proprioceptive trajectory of body posture change and visual trajectory of the ball falling into the net can be predicted. In a similar manner, such predictive models acquired by a skilled carpenter can predict the visuo-auditory-proprioceptive flow associated with an intention of “hitting a nail.” These illustrations just follow the aforementioned thought by Gibson and Pick. The point here is that a predictive model may not need to predict the perceptual outcomes for all possible combinations of motor commands, including many unrealistic ones. If the predictive model attempts to learn to predict
Introducing the Brain and Bra in Science
59
Mismatch info. Intention
M
S
Parietal
Proprioceptive prediction
Mismatch Visual prediction
Visual perception
Motor command
Figure 4.6. Predictive model in the parietal cortex. By receiving intentio n for action from the prefrontal cortex it predicts perceptual outcomes such as visuo- proprioceptive trajectories. Prediction of proprioception in terms of body posture results in the generation of necessary motor command sequences for achieving it. The intention is modified in the direction of minimizi ng the mismatch between the prediction and the perceptual outcome.
all possible motor command combinations, such an attempt will face a combinatorial explosion, which has been known as the “frame problem” (McCarthy, 1963) in AI research. Instead, a predictive model needs to predict possible perceptual trajectories associated only with a set of wellpracticed familiar actional intentions. Jeannerod (1994) has conjectured that individuals have so-called motor imagery for their well- practiced behaviors. Motor imagery is a mental process by which an individual imagines or simulates a given action without physically moving any body parts or sensing any signals from the outside world. The predictive model assumed in the parietal cortex can generate motor imagery by means of a look-ahead prediction of multimodal perceptual trajectories ov er a cert ain period. I ndeed, Sirigu and colleagues (1996) compared healthy individuals, patients with damage to the primary motor area, and ones in the parietal cortex reported that patients with lesions in the parietal cortex showed selective impairment in generating motor imagery. If the predictive model just predicts perceptual sequences for given intentions for action, how can motor command sequences be obtained? It can be considered that a predicted body posture state in terms of anticipated proprioception might be sent to the premotor cortex or primary
60
On the Mind
motor cortex (M1) via primary somatosensory cortex (S1) as a target posture to be achieved in the next time step. This information is further sent to the cerebellum, where the necessary motor commands or muscle forces to achieve this target posture might be composed. The target sensory signal could be a reaction force that is anticipated to be perceived, for example, in the thumb and index finger in the case of precisely grasping a small object. Again, the cerebellum might compute the necessary motor torque to be exerted on the thumb and finger joints in order to achieve the expected reaction force. This constitutes the top-down subjective intentional pathway acting on the objective world as introduced through the brief review of phenomenology given in chapter 3. Let’s look next at the bottom-up recognition that is thought to be the counterpart to top- down prediction. The prediction of sensory modalities such as vision and tactile sensation that is projected to each peripheral sensory area through the top- down pathway might be compared with the actual outcome. When the visual or tactile sensation actually perceived is something different from the predicted sensation, like in the situation described by Heidegger wherein the hammer misses hitting a nail (see chapter 3), the current intention of continuing to hit the nail would be shifted consciously to a different intention such as looking for the mishit nail or in searching for an unbroken hammer. If the miss-hit does not happen, however, everything will continue on automatically as expected withou t any shift s occurring in t he current intention. Such shifts in intentional states might be brought about through the mismatch error between prediction and perceptual reality. When such a mismatch is generated, the intention state may be updated in the direct ion of minimizing t he mismatch error. As the consequence of interaction between these top- down and bottom-up processes, current intentions can be reformed in light of a changing situation or mistaken environment. When action changes the perceptual reality from the one expected, the recognized perceptual reality alters the current intention. This aspect of top- down and bottom-up interaction is analogous to predictive coding suggested for hierarchical visual processing as proposed by Rao and Ballard (see section 4.1). The obvious question to ask is whether in fact the brain actually employs such intention adjustment mechanisms by monitoring the outcomes of its own predictions or not. There is some recent evidence to this effect based on human brain imaging techniques including functional magnetic resonance imaging (fMRI)
Introducing the Brain and Bra in Science
61
and electroencephalography (EEG). Both techniques are known to be good at measuring global brain activity and to compliment one another, with relatively good spatial resolution from fMRI and good temporal resolution from EEG. These imaging studies have suggested that the temporoparietal junction (TPJ), where the temporal and parietal lobes meet, the inferior frontal cortex, and the SMA may all be involved in detecting mismatches between expected and actual perception in multimodal sensations (Downar et al., 2000; Balslev et al., 2005; Frith & Frith, 2012). It may be the TPJ that triggers adjustments in current action by detecting such mismatches (Frith & Frith, 2012). That said, it may be reasonable to consider the alternative, that interactions between top- down prediction with a specific intention and bottom-up modification of this intention take place in a web of local networks including the frontal cortex, parietal cortex, and the various peripheral sensory areas, rather than in one specific local region. From this more distributed point of view, whatever regions are actually involved, it is the interactions between them that are indispensable in the organization of diverse intentional skilled actions in a changeable environment. 4.2.2 Returning to Merleau-Ponty The concept behind the predictive model accords well with some of Merleau-Ponty’s thinking, as described in chapter 3. In his analysis of a blind man walking with a stick, he writes that the stick can be also a part of the body when the man scans his surroundings by touching its tip to things. This phenomenon can be accounted for by the acquisition of a predictive model for the stick. During a lengthy period in which the man uses the same stick, he acquires a model through which he can anticipate how tactile sensation will propagate from the tip of stick while touching things i n his environment. Because of t his unconscious anticipation, which we can think about in terms of Husserl’s notion of protention (e.g., we would anticipate hearing the next note of “mi” when hearing “re” in “do- re-mi,” as reviewed in chapter 3), and recalling Heidegger’s treatment of equipment as extensions of native capacities for action, the stick could be felt to be a part of the body, provided that the anticipation agrees with the outcome. Related to this, Atsushi Iriki and colleagues (1996) made an important finding in their electrophysiological recording of the parieta l cortex
62
On the Mind
in monkeys during a tool manipulation task. Monkeys confined to chairs were trained to use a rake to draw toward them small food objects located in front of them. Af ter the trai ning, neurons in the intraparietal sulcus, a part of the parietal cortex, were recorded for two phases: capturing t he food without the rake and capturing the food with it. I n the without-rake phase, they found that some bimodal neurons fired either when a tactile stimulus was given to the palm of the hand or when a visual stimulus approached the vicinity of the palm. It was shown that these part icular neurons have a certain receptive field. T hus, each neuron fires only when the visual or t actile sti mulus comes to a specific position relative to the palm (Figure 4.7a). Surprisingly, in the with- rake phase, the same neurons fired when the visual stimulus approached the vicinity of the rake, thus demonstrating an extension of the visual receptive field to include the rake (Figure 4.7 b). This shifting of the receptive field from the vicinity of the hand to that of the rake implies that the monkey perceives the rake as a part of the body when extended from the hand and purposefully employed, in the same way that the stick becomes a part of the body of a blind man. Monkeys thus seem to embody a predictive model that includes possible interactions between the rake and the food object. The phantom limb phenomenon described in chapter 3 can be understood as an opposite case to that of the blind man’s stick. Even though
(b)
(a)
Figure 4.7. The receptive field of neurons in the intraparietal sulcus (a) in the vicinity of the hand in the without-rake phase and (b) extended to cover the vicinity of the rake in the with- rake phase.
Introducing the Brain and Bra in Science
63
the limb has been amputated, the predictive model for the limb might remain as a “ familiar horizon,” as Merlea u-Ponty would say, which would generate the expect ation of a sensory i mage corresponding to the cu rrent action intention, which is then sent to the phantom limb from the motor cortex. The psychosomatic treatment invented by Ramachandran and Blakeslee (1998) using the virtual- reality mirror box provided patients with fake visual feedback that an amputated hand was moving. This feedback to the predictive model would have evoked the proprioceptive image of “move” for the amputated limb by modifying the current intention from “freeze” to “move,” which might result in the feeling of twitching that patients experience in phantom limbs. Merleau-Ponty held that synesthesia, wherein sensation in one modality unconsciously evokes perception in another, might srcinate from iterative interactions between multiple modalities of sensation and motor outputs by means of reentrant mechanisms established in the coupling between the world and us (see chapter 3). If we consider that the predictive model deals with the anticipation of multimodality sensations, it is not feasible to assume that each modality of sensation anticipates this independ ently. Ins tead, a shared st ructure should exis t or be organized that can anticipate incoming sensory flow from all of the modalities together. It is speculated that a dynamic structure such as this is composed of collective neuronal activity, and it makes sense to consider that the bimodal neurons found in the parietal cortex as well as in the premotor cortex might in part constitute such a structure. In sum then, the functional role of the parietal cortex in many ways reflects what Merleau-Ponty was pointing to in his philosophy of embodiment. Actually, the how pathway stretching through the parietal cortex is reminiscent of ambiguity in Merleau-Ponty’s sense, as it is located midway between the visual cortex that receives visual inputs from the objective world and the prefrontal cortex that provides executive control with subjective intention over the rest of the brain. Several fMRI studies of object manipulation and motor imagery for objects have shown significant activation in the inferior parietal cortex. Probably the goal of object manipulation propagates from the prefrontal cortex through the supplementary motor area to the parietal cortex via the top-down pathway, whereas perceptual reality during manipulation of the object propagates from the sensory cortices, including the visual cortex and somatosensory cortex for tactile and proprioceptive sensation, via the bottom-up
64
On the Mind
pathway. Both of these pathways likely intermingle with each other, with close interaction occurring in the parietal cortex. 4.2.3 Mirror Neurons: Unifyi ng the Generation and Recognition of Actions Many researchers would agree that the discovery of mirror neurons by Rizzolatti’s group in 1996 is one of t he most important findings for systems neuroscience in recent decades. Personally, I find the idea of mirror neurons very appealing because it promises to explain how the two essential cognitive processes of generating and recognizing actions can be unified into a single system. 4.2.4 The Evidence for Mirror Neurons In the mid-1990s, researchers in the Rizzolatti laboratory in Parma were investigating the activities of neurons in the ventral premotor area (PMv) in the control of hand and mouth movements in monkeys. They had found that these neurons fired when the monkey grasped food objects, and whenever they fired, electrodes activated electronic circuitry to give an audible beep. Serendipitously, one day when a graduate student entered the lab with an ice cream cone in his hand, every time he brought it to his lips, the system responded with a beep! The same neurons were firing both when the monkey grasped food objects and moved them to its mouth and when the monkey observed others doing a similar action. With a grad student bringing an ice cream cone to his mouth, mirror neurons were discovered! Figure 4.8 shows the firing activity of a mirror neuron responding to a particular self- generated action as well as to the same action performed by an experimenter. Figure 4.8a shows a PMv neuron firing as the monkey observes the experimenter grasping a piece of food. Here, we see that the firing of the neuron ceases as the experimenter moves the food toward the monkey. Then, the same neuron fires again when the monkey grasps the food given by the experimenter. In Figure 4.8 b, it can be seen that the same neuron does not fire when the monkey observes the experimenter picking up the food with an (unfamiliar!) tool, but thereafter firing occurs as described for the rest of the sequence of events in (a).
Introducing the Brain and Bra in Science (a)
65
(b)
–
s s e k i p S
20
–
s s e k i p
0
S
0 02345 Time (s)
20 0 0 023
4
5
Time (s)
Figure 4.8. How mirror neurons work. (a) Firing of a mirror neuron shown in raster plots and his tograms in the t wo situations in which the monkey observes the experimenter grasp a piece of food (left ) and thereafter when the monkey grasps the same piece of food (right). (b) The same mirror neuron does not fire when the monkey observes the experimenter pick up the food with a tool (left ), but it fires again when the monkey grasps the same piece of food (right). Adopted from (Rizzolatti et al., 1996) with permission.
these “grasping they alsosame foundway. “holding andBesides “tearing neurons” that neurons,” functioned in the Thereneurons” are two important characteristics about these mirror neurons. The first is that they encode for entire goal-directed behaviors, not for parts of them, that is, the grasping neurons do not fire when the monkey is just about to grasp the object. The second chara cterist ic is that all the mir ror neurons found in the monkey experiments are related to transitive actions toward objects. Mirror neurons in the monkey so far do not respond to intransitive behaviors such as reaching the hand toward a part of the body. That said, however, it looks to be a different case for humans, as recent human fMRI imaging studies found mirror systems also for intransitive actions (Rizzolatti & Craighero, 200 4). Recent monkey experiments by Rizzolatti’s group (Fogassi et al., 2005) have indicated that mirror neurons can be observed in the inferior parietal lobe (IPL) and that these function to both generate and recognize goal-directed actions composed of sequences of elementary movements. In their experiments, monkeys were trained to perform two different goal-directed actions: to grasp pieces of food and then move them to their own mouths to eat, and to grasp solid objects (the same size and shape as the food objects) and then place them into a cylinder. Interestingly, the
66
On the Mind
activation patterns of many IPL neurons while grasping the objects differ depending on the subsequent goal, namely to eat or to place, even though the kinematics of grasping in both cases are the same. Supplemental experiments confirmed that the activation preferences during grasping do not srcinate from differences in visual stimuli between food and a solid object, but from the difference between goals. This view is reinforced by the fact that the same IPL neurons fired when the monkeys observed the experimenters achieving the same goals. These IPL neurons can therefore also be regarded as mirror neurons. It is certainly interesting that mirror neuron involvement is not limited to the generation and recognition of simple actions, but also occurs with compositional goaldirected actions consisting of chains of elementary movements. Recent imaging studies focusing on imitative behaviors have also identified mir ror systems in humans. Imitation is consid ered to be cognitive behavior whereby an individual observes and replicates the behaviors of others. fMRI experimental results have shown that neural activation in the posterior part of the left inferior frontal gyrus as well as in the right superior temporal sulcus i ncreases during imitation ( Iacoboni et al., 1999). If we consider that the posterior part of the left inferior frontal gyrus (also called Broca’s area) in humans is homologous to the PMv or F5 in monkeys, it is indeed feasible that these local sites could host mirror neurons in humans. Although it is still a matter of debate as to how much other animals including nonhuman primates, dolphins, and parrots can perform imitation, it is st ill widely held that the imitation capability uniquely evolved in humans has enabled them to acquire wider skills and knowledge about human-specific intellectual behaviors including tool use and language. Michael Arbib (2012) has explored possible linkages between mirror neurons and human linguistic competency. Based on accounts of the evolutionary pathway from nonhuman primates to human, he has developed the view that the involvement of mirror neurons in embodied experience grounds brain st ructures that underlie languag e. He has hypothesized that what he calls the “ human language- ready brain” rests on evolutionary developments in primates including mirror system processing (for skillful manual manipulations of objects, imitation of the manipulations performed by others, pantomime, and conventionalized manual gestures) that initiates the protosign system. He further proposed that the development of protosigns provided the scaffolding essential for protospeech in the evolution of protolanguage (Arbib, 2010).
Introducing the Brain and Bra in Science
67
This hypothesis is interest ing in light of the fact that mir ror neurons in human brains might be responsible for recognizing the intentions of others as expressed in language. Actually, researchers have examined this idea using various brain imaging techniques such as fMRI, positron emission tomography, and EEG. Hauk and colleagues (2004) showed in an f MRI experiment that reading action- related words with different “end effectors,” namely “lick,” “pick,” and “kick,” evoked neural activities in the motor areas that overlap with the local areas responsible for generating motor movements in the face, arm, and leg, respectively. More specifically, “lick” activated the sylvian fissure, “pick” activated the dorsolateral sites of the motor cortex, and “kick” activated the vert ex and interhemispher ic sulcus. Broca’s area was activated for all three words. Tettamanti and colleagues (2005) observed similar types of activation patterns when their subjects listened to action- related sentences such as “I bite an apple,” “I grasp a knife,” and “I kick a ball.” Taken together, these results sug gest that unders tandi ng action- related words or sentences generates certain canonical activation patterns of mirror neurons, possibly in Broca’s area, which in turn initiate corresponding activations in motor- related areas. These results also suggest that Broca’s area might be a site of mirror neuronal activity in humans. Vittorio Gallese and Alvin Goldman (1998) suggest that mirror neurons in humans play an essential role in theory of mind in social cognition. The theory of mind approach postulates that although the mental states of others are hidden from us, they can be inferred to some extent by applying naïve theories or causal rules about the mind to the observed behavior of others. They argue for a simulation theory, whereby the mental states of others are interpretable through mental simulations that adopt their perspective, by tracking or matching their states with states of one’s own. If these aforementioned human cases are granted, it can be said that the mirror neuron system has played an indispensable role in the emergence of uniquely human cognitive competencies from evolutionary pathways, from manual object manipulation, to protolanguage and to theory of mind. 4.2.5 How Might Mirror Neurons Work? The reader may ask how the aforementioned mirror neural functions might be implemented in the brain. Let’s consider the mirror neuron
68
On the Mind
mechanism in terms of the aforementioned predictive model (see Figure 4.6), which we assumed may be located in the parietal cortex. If we assume that mirror neurons encode intention for action, we can easily explain how a p articular activation pattern of the mirror neurons can lead to t he generation of one’s own speci fic action and how recognition of the same action performed by others can lead to t he same action pattern in the mirror neurons. (Although Figure 4.6 assumed that the intention might be hosted somewhere in the prefrontal area, it could be by mirror neurons present in this area including Broca’s area for humans.) In generating one’s own actions, such as grasping a coffee cup, expected perceptual sequences in terms of relative position, orientation, and posture of one’s own hand in relation to the cup are predicted by receiving inputs from mir ror neuron activation that represents the intentional state for this action. Different actions can be generated by receiving inputs of different mirror neuron activation patterns, whereby the mirror neurons funct ion as a switcher between a set of intentional actions. Recognit ion of the same action performed by others can be achieved by utilizing the mismatch information as described previously. In the case of observing others grasp the coffee cup, the corresponding intentional state in ter ms of the mirror neuron activity pattern can be searched such that the reconstructed perceptual sequence evoked by this intentional state can best fit with the actually perceived one in the coordinate system relative to the coffee cup, thereby minimizing the mismatch error. On this model, the recognition of other’s actions causes one to feel a s if one’s own actions were being generated, due to the generation in t he mirror neurons of motor imag ery re presenting the same intentional st ate. This assumption accords exactly with what Gallese and Goldman (1998) suggested for mir ror neurons in terms of simulation theo ry as described previously. They suggested that mirror neuron discharge serves the purpose of retrodicting target mental states, moving backward from the observed action, thus representing a primitive version of a simulation heuristic that might underlie “mind- reading.” We will come back to the idea of the predictive coding model for mirror neurons in greater detail as we turn to related robotics experiments in later chapters.
Introducing the Brain and Bra in Science
69
4.3. How Can Intention Arise Spontaneously and Become an Object of Conscious Awar eness? In this chapter so far, we have seen that voluntary actions might be generated by means of a top-down drive by an intention. The intention could be hosted by the mirror neurons or other neurons in the prefrontal cortex. W herever it is represented in the brain, we are left w ith the essential question of how an intention itself can be set or generated. Th is question is related to the problem of free will that was introduced in the description of Williams James’s philosophy (see chapter 3). As he says, free will might be the capability of an agent to choose independently a course of action freely from among multiple alternatives. The problem about free will t hen concerns its srcin. If every aspect of free will can be explained by deterministic physical laws, there should be no space actually remaining for free will. Can our minds set intentions for actions absolutely freely without any other causes? Can intentions shift from one to another spontaneously in a chain for generating various actions? Another interesting question concerns the issue of consciousness. If we can freely determine our actions, how can this determination be accompanied by consciousness? Or more simply, how can I feel consciously that I have just determined to do one thing and not another? Although there have been no definitive answers to this philosophical question thus far, there have been some interesting experimental resu lts showing possibl e neural correlates of intention and free will. 4.3.1 Searching for the Neural Correlates of Intention I would like to introduce, first, the seminal study on conscious intention conducted by Benjamin Libet. In his experiments (Libet, 1985), subjects were asked to press a button with their right hands at whatever moment they wished and their EEG activity was recorded from their scalp. Libet was trying to measure the exact timing when the subjects became conscious of their decision to initiate the button press action, which he called “w-judgment” time. The subjects were asked to watch a rotating clock hand and to remember the exact position of the clock hand when they first felt the urge to move their hand to press the button. By asking the subjects to report t he position after each button press
70
On the Mind
trial, the exact timing of their conscious intention to act could be measured for each trial. It was found that the average timing of conscious intent to act is 206 ms before the onset of muscle activity and that the Readiness Potential (RP) to build up brain activity (as measured by EEG) started 1 s before movement onset (Figure 4.9). This EEG activity was localized in the SM A. This i s a somewhat surprising result because it implies that the voluntary action of pressing the button is not initiated by conscious intention but by unconscious brain activity, namely the readiness potential evoked in the SMA. At the very least, it demonstrates that one prepares to act before one decides to act. It should be, however, noted that Libet’s experiment has drawn substantial criticism along with enthusiastic debates on the results. It is said that subjectiv e est imate of time for consciousn ess ar ising is not reliable (Haggard, 2008). Also, Trevena and Miller (2002) reported that many reported conscious decision times were before the onset of the Lateralized Readiness Potential that represents actual preparation for movement as opposed to RP representing contemplation for movement as a futu re possibility. However, it is also true that Libet’s study has been replicated by others and further extended experiments have been conducted (Haggard, 2008). Soon and colleagues (2008) showed that this unconscious brain activity to initiate voluntary action begins much longer before the onset
Conscious decision
Readiness potential onset
–206 ms
– 500 ms – 000 ms e g a t l o V
+
–2
–
0
Time (s)
Movement onset
Figure 4.9. The readiness poten tial to build up brain activity prior to movement onset, recorded during a free decision task conducted by Libet (1985).
Introducing the Brain and Bra in Science
71
of physical action. By utilizing f MRI brain imaging, they demonstrated that brain activity is initiated in the frontopolar part of the prefrontal cortex and in the precuneus in the medial area of the superior parietal cortex up to 7 s before a conscious decision is made to select either pressing the left button with the left index finger or the right button with the right index finger. Moreover, from the brain activity observed, the outcome of the motor decision to select between the two actions (a selection the subjects did not consciously make) could be predicted from this early brain activity, prior to reported consciousness of such selection. 4.3.2 How to Initiate Intentions and Become Consciously Aware The experimental evidence provided by Libet and Soon’s group can be integrated to produce the following hypothesis. Brain activity for selecting a voluntary action is initiated unconsciously in the frontopolar part of the prefrontal cortex or in the precuneus in the parietal cortex from more than several seconds to 10 seconds before the onset of corresponding physical movement, then is transmitted downstream to the SMA 1 second before the movement, with consciousness of this intention to act arising only a few hundred milliseconds before movement onset. Controversially, this implies that there is no room left for free will, because our conscious intent that seemingly determines free next actions appears to actually be caused by preceding unconscious brain activities arising a long time before. If this is indeed true, it raises two fundamental questions. Can we f reely initiate unconscious brain activity in the frontopolar part of the prefrontal cortex or in the parietal cortex? And second, why do we feel conscious intention for voluntary action only at a very late stage of preparing for action, and what is the role of this conscious intention if it is not behind determining subsequent voluntary actions? To address the first question, let’s assume that the unconscious activity in the beginning might not be caused by anybody or anything, but may appear automatically, by itself, as an aspect of continuously changing brain dynamics. This notion relates to the “spontaneous generation of alternative images and thoughts” put forward by William James. As described previously (see Figure 3.4), when memory hosts complex relations or connections between images of past experiences, an image may be regenerated with spontaneous variations into streams of
72
On the Mind
consciousness. This idea of James leads to the conjecture that continuous transitions of images are generated spontaneously along trajectories of brain activation states visiting fi rst one image state and t hen another iteratively. Such spontaneous transitions can be accounted for by observations of the autonomous dynamic shifts of firing patterns in collective neurons in the absence of external stimulus inputs. Using an advanced optical imaging technique, Ikegaya and colleagues (2004) observed the activities of a large number of neurons in the in vitro hippocampus tissue of rats. Their main finding concerns what the authors metaphorically call a “cortical song” wherein var ious spatiotemporally distributed fir ing patterns of collective neurons appear as “motifs” and shift from one to another spontaneously. Although those motifs seem to appear randomly in many cases, they of ten repeat in sequences exhibiting some regularity. Based on other work done by Churchland and colleagues (2010), we now also know how fluctuations in activities of collective neurons in the PMC during the preparation of movements can affect the generation of succeeding actual movements. They recorded the simultaneous activities of 96 PMC cells of monkeys during the preparatory period for a go-cue–triggered visual target reaching task 2 over many trials. First, they found that the trajectories of the collective neural activities could be projected into a two- dimensional axis from 96 srcinal dimensions by a mathematical analysis similar to principal component analysis. They also found that those trajectories from the go cue until the onset of movement were mostly repeated for different trials of normal cue response cases (Figure 4.10). An exception to the preceding schema was observed during preparatory periods leading to generation of failure behaviors such as abnormally delayed responses. In such cases, it was seen that the neural activation trajectories fluctuated significantly. Such fluctuating trajectories appeared even though the setting at each trial was identical. Then, how can such fluctuating activities of collective neurons occur? Freeman (2000) and many others have speculated that such spontaneous fluctuation might be generated by means of deterministic chaos developed in the neural activity either at the local neuronal circuit level or at larger
2. The animals are trained to reach a position that was prior- specified visually immediatel y after a go- cue.
Introducing the Brain and Bra in Science (a)
Failure
73
(b) Go cue
Go cue
Movement
Pre-target Movement onset
onset Pre-target
Failure
Figure 4.10. Overwriting of 15 trajectories by means of two-dimensional projection of the activities of 96 neurons i n the dorsal premotor cortex a rea of a monkey during repeated trials of reaching for a visual target task (a) on one occasion and (b) on a different occasion. In both plots, trajectories for failure cases are shown with thick lines. Adopted from (Churchland et al., 2010) with permission.
cortical area levels. These possibilities are explored in Chapter 10. To sum up then, continuous change in the cortical dynamical state might account for the spontaneous generation, without any external causes, of various intentions or images for next actions. The second question concerning why we become conscious of intention for voluntary action only at a very late stage of preparation for action remains difficult to answer at present. However, several reports on cortical electrical stimulation in human subjects might open a way to an answer. Desmurget and colleagues (2009) offer us two complementary pieces of evidence obtained in their cortical electrical stimulation study conducted in patients with brain tumors. The study employed perioperative brain stimulations with a bipolar electrode during awake surgery for tumor removal. Stimulations of the premotor cortex evoked overt mouth and contralateral limb movements. But what was interesting was that in the absence of visual feedback, the patients firmly denied making the movements they actually made; they were not consciously aware of the movements generated. Conversely, stimulation of the parietal cortex created an intention or desire in the patients to move. With stronger stimulation, they reported that they had moved their limbs even though they had not actually moved them. Given this result, Desmurget and colleagues speculated that the parietal cortex might mediate error monitoring between the predicted perceptual outcome for the intended action
74
On the Mind
and the actual one. (These results also imply that the depotentiation of the parietal cortex without an error signal signifies successful execution of the intended action.) Fried and colleagues (1991) reported results of direct stimulation of the presupplementary motor area in patients as part of neurosurgical evaluation. Stimulation at a low current elicited the urge to move a specific body part contralateral to the stimulated hemisphere. This urge to move the limbs is similar to a compulsive desire and in fact the patients reported that they felt as if they were not the agent of the generated movements. In other words, this is a feeling of imminence for movements of specific body parts in specific ways. Actually, the patients could describe precisely the urges evoked; for example, the left arm was about to move inward toward the body. This imminent intention for quite specific movements with stimulation of the presupplementary motor area contrasts with the case of parietal stimulation mentioned earlier, in which the patients felt a relatively weak desire or intention to move. Another difference between the two studies is that more intense stimulation tended to produce actual movement of the same body part when the presupplementary motor area, but not the parietal cortex, was stimulated. Putting all of this evidence together, we can create a hypothesis for how conscious intention to initiate actions is organized in the brain as follows. The intention for action is built up from a vague intention to a concrete one by moving downward through t he cortical h ierarchy. In the first stage (several seconds before movement onset), the very early form of the intention is initiated by means of spontaneous neuronal state transitions in the prefrontal cortex, possibly in the frontopolar part as described by Soon and colleagues. At this stage, the intention generated might be too vague to access its contents and therefore it wouldn’t be consciously accessible (beyond a genera l mood of anticipation, to recal l Heidegger once again) . Subsequently, the signal ca rrying this early form of intention is propagated to the parietal cortex, where predict ion of perceptua l sequences based on this intention is generated. This idea follows the aforementioned assumption about functions of the parietal cortex shown in Figure 4.6. By generating a prediction of the overall profile of action in terms of its accompanying perceptual sequence, the contents of the current intention become consciously accessible. Then, the next target position for movement predicted by the parietal cortex in terms of body posture or proprioceptive state is sent to the presupplementary motor area, where a specific motor
Introducing the Brain and Bra in Science
75
program for the required immediate movement is generated online. This process generates the feeling of immi nence for movements of specific body parts in specific ways, as described by Fried and colleagues. The motor program is sent to the premotor cortex and primary motor cortex to generate corresponding motor commands. This process is assumed to be essentially unconscious on the basis of the findings of Desmurget and colleagues (2009) mentioned earlier. A big “however” needs to follow this hypothesis, because it contains some unclear parts. First, this hypothesis conflicts on a number of points described thus far in this book, and the details of these conflicts are examined in the next section. Second, it has not been clarified yet how the contents of the current intention become consciously accessible in the parietal cortex in the process of predicting the result ant perceptual sequences. As related to t his problem, Da vid Chalmers speculates that it is nontrivial to account for the quality of human experiences of consciousness in terms of neuroscience data alone. This is what he calls the hard problem of consciousness (Chalmers, 1995) . Thi s hard problem is contrasted with t he so called
easy problem in which a target neural function can be understood by its reduction into processes of physical matter. Suppose that a set of neurons that fire only at conscious moments are successfully identified in subjects. Yet, there is no way to explain how the firings of t hese neurons result in the conscious experiences of the subjects. This i s the “hard problem.” Analogously, how can we account for causal relationships between consciousness of one’s own actions and neural activity in the parietal cortex? This problem will be revisited repeatedly in later chapters, as it is central to this current book. Next, however, we must look at some of remaining open problems.
4.4. Deciding Among Conflicting Evidence Let’s remind ourselves of the functional role of the presupplementary motor area described by (Tanji & Shima, 1994; Shima & Tanji, 1998; Shima & Tanji, 2000). Their electrophysiological experiments with monkeys showed that this area includes neurons responsible for organizing sequences of primitive movements. However, their findings conflict with those of Fried and colleagues (1991), obtained by human brain
76
On the Mind
electrical st imulation. These researchers claim that the presupplem entary motor area is responsible for generating merely the urge for imminent movements, not for the expectation or desire for whole actions consisting of sequences of elemental movements. If Tanji and Shima’s findings for the role of the presupplementary motor area in monkeys hold true for humans, then electrical stimulation of the presupplementary area in humans should likewise evoke desire or expectation for sequences of elementary movements. We’ll come back to the possible role of the presupplementary motor area in human cognition in a moment. Another conflict concerns the functional role of the premotor cortex. Although the premotor cortex (F5 in monkeys) should host intentions or goals for the next actions to be generated according to the mirror neuron theory put forward by Rizzolatti’s group (Rizzolatti et al., 1996), later experiments by Desmurget and Sirigu (Sirigu et al., 2003; Desmurget et al., 2009) suggest that it may not be the premotor cortex that is involved in conscious intention for action but the parietal cortex, as described in the previous section. In fact, R izzolatti and colleagues (Fogassi et al., 2005) did later find mir ror neurons in the parietal cortex of monkeys. These mirror neurons in the parietal cortex seem to encode intention for sequences of actions for both one’s own action sequence generation and while observing similar action generation by others. We may ask then whether some neurons, not just those in the premotor cortex but also those in the parietal cortex, fi re as mirror neurons in the case of generating as well as recognizing single actions like grasping food objects, as described in t he srcinal mirror neuron paper (Rizzolatti et al., 1996). The puzzle we have here is the following. What is the primary area for generating voluntary actions? Is the presupplementary motor area to be considered the locus for generating voluntary action? Or is it the premotor cortex, the srcinal mirror neuron site? Or, is it the parieta l cortex, responsible for the prediction of action - related perceptual sequences? Or, ultimately, is it the prefrontal cortex, as the center for executive control? Although it could be the supplementary motor cortex, premotor cortex, or parietal cortex, we simply cannot tell right now, as the evidence currently available to us is apparently contradictory. Finally, we might be disappointed that circuit-level mechanisms for the cognitive functions of interest are still not accounted for exactly by current brain research. Neuroscientists have taken a reductionist approach by pursuing possible neural correlates of all manner of things.
Introducing the Brain and Bra in Science
77
They have investigated mappings between neuronal activities in specific local brain areas and their possible functions, like the firing of presupplementary motor area cells in action sequencing or of mirror neurons in the premotor cortex in action generation and recognition, with the hope of clarifying some mechanisms at work in the mind and cognition. Although clearly the accumulation of such evidence serves to inspire us to imagine how the mind may arise from activity in the brain, such evidence cannot yet tell us the exact mechanisms underlying different types of subjective experience, at least not in a fine-grained way adequate to confirming one- to-one correlative mappings from the “what it feels like” to specific physical processes. How can the firings of specific cells in the presupplementary motor area mechanize the generation of corresponding action sequences? How can the firings of the same premotor cells in terms of mirror neurons mechanize both the generation of specific actions and the recognition of the same actions by others? What are the underlying circuitry level mechanisms accounting for both, as well as the feeling of w itnessing either? In order to answer questions like these, we might need future technical breakthroughs in measurement methods such as simultaneous recording of a good number of neurons and their synaptic connectivity in target functional circuits which are associated with modeling scheme of good quality.
4.5. Summary This chapter explored how cognitive minds can be mechanized in biological brains by reviewing a set of empirical results. First, we reviewed general understandings of possible hierarchical architectures in visual recognition and motor action generation. In the visual pathway, earlier stages of the visual system (in the primary visual cortex) are thought to deal with the processing of detailed information in the retinotopic image and later stages to deal with more abstract information processing (in the inferior temporal cortex). Thus, some have assumed that complex visual objects can be recognized by decomposition into specific spatial combinations of visual features represented in the lower level. The action generation pathway is also presumed to follow hierarchical processes. It is assumed that the supplementary motor area (SMA) and the premotor cortex (PMC) perform higher level coordination for generating voluntary action and sensory- guided action by sending control signals to the primary motor cortex (M1) in the lower level.
78
On the Mind
However, there has arisen some conflicting evidence that does not support the existence of a rigid hierarchy both in t he visual recognition and in action generation. So, we next examined a new way of conceiving of the processes at work in which action generation and sensory recognition are inseparable. We found evidence for this new approach in the review of recent experimental studies focusing on the functional roles of the parietal cortex and mirror neurons distributed through different regions of the brain. We entertained the hypothesis that the parietal cortex may host a predictive model that can anticipate perceptual outcomes for actional intention encoded in mirror neurons. It was also speculated that a particular perceptual sequence can be recognized by means of inferring the corresponding intention state, and that the predictive model can regenerate this sequence. A hallmark of this view is that action might be generated by the dense interaction of the top-down proactive intention and the bottom-up recognition of perceptual reality. Furthermore, we showed how this portrait is analogous to to MerleauPonty’s philosophy of embodiment. An essential question remained. How is intentio n itself se t or generated? This question is related to the problem of free will. We reviewed findings that neural activities correlated with free decisions are in itiated in various regions including the SMA, the prefrontal cortex, and the parietal cortex significantly before individuals become consciously aware of the decision. These findings raise two questions. The first question concerns how “unconscious” neural activities for decisions are initiated in those related regions. The second question concerns why conscious awareness of free decisions is delayed. Although we have provided some possible accounts to address these questions, they are yet speculative. Also in this chapter, we have found that neuroscientists have taken a reductionist approach by pursuing possible neural correlates of all manner of things. They have investigated mappings between neuronal activities in specific local brain areas a nd their possible functions. Although the accumulation of such evidence can serve to inspire us to hypothesize how the normal functioning brain results in the feeling of being conscious, neurological evidence alone cannot yet specify the mechanisms at work. And with this, we have seen that not one, but many important questions about the nature of the mind remain to be ans wered. How might we see neural correlates for our conscious experience? Suppose that we might be able to record all essential neuronal data such as the connectivity, synaptic transmission efficiency,
Introducing the Brain and Bra in Science
79
and neuronal firings of all related local circuits in the future. Will this enable us to understand the mechanisms behind all of our phenomenological experiences? Probably not. Although we would find various interesting correlations in such massive datasets, like the correlations between synaptic connectivity and neuronal firing patterns or those between neuronal firing patterns and behavioral outcomes, they would stil l just be correlations, not proof of causal mechanisms. Can we underst and the mechanisms of a computer’ s operating sys tem (OS) just by putting electrodes at various locations on the mother board circuits? We may obtain a bunch of correlated data in relation to voltages but probabl y not enough to infer t he principles behind t he workings of a sophisticated OS. By tak ing seriously limitations inherent to t he empirical neuroscien ce approach, this book now begins to explore an alternative approach, a synthetic modeling approach that attempts to understand possible neuronal mechanisms underlying our cognitive brains by reconstructing them as dynamic ar tifacts. T he synthetic modeling appr oach described in this book has two complementary focuses. T he first is to use dynamical systems perspectives to understand various complicated mechanisms at work in cognition. The dynamical systems approach is effective in articulating circular causality, for instance. The second focus concerns the embodiment of the cognitive processes, which were briefly described in the previous chapter. The role of embodiment in shaping cognition is crucial when causal links go beyond brains and es tablish circular causalities between bodies and their environments (e.g., Freeman, 2000.) The next chapter provides an introductory account that considers such problems.
5 Dynamical Systems Approach for Modeling Embodied Cognition
Nobel laureate in physics Richard Feynman once wrote on the chalkboard during a lecture:
What I cannot create, I cannot understand. — Richard Feynman 1 Conversely, thus: I can understand what I can create. This seems to make sense because if we can synthesize something, we should know its organizing principles. By this line of reasoning, then, we might be able to understand the cognitive mind by synthesizing it. But how can we synthesize the mind? Basically, the plan is to put some computer simulation models of the brain into robot heads and then examine how the robots behave as well as how the neural activation state changes dynamically in the artificial brains while the robots interact with the environment. The clear difficulty involved in doing this is how to build these brain models. Although we don’t yet know exactly their organizing principles, we should begin by deriving the most likely ones through a thorough survey of results from neuroscience,
1. This statement was found on his blackboard at the time of his death in February 1988.
81
82
On the Mind
psychology, and cognitive science. In robotics experiments, we can examine neural activation dynamics (of a brain model) and behaviors (of such embrained robots) as robots attempt to achieve goals of cognitive tasks designed by experimenters. It is not trivial to anticipate— dare we say guess— what sorts of phenomena might be observed in such experiments even though the principles used in engineering relevant brain models are well defined. This comes from the fact that all interactions that occur within the model brains, as well as between them and the environment by circular causality, are dominated by nonlinear dynamics for which numerical solutions cannot be obtained analytically. Rather, we should expect that such robotics experiments might evidence nontrivial phenomena that are not to be inferred from formative principles themselves. If such emergent phenomena observed in experiments correspond to various bodies of work including empirical observations in neuroscience, computational aspects in cognitive science, and reports from phenomenological reduction, the presumed principles behind the models would seem to hold. Moreover, it would be great if just a small set of principles in the model could account for numerous phenomena of the mind through their synthesis. This is the goal of the synthetic approach, to articulate the processes essential to cognition as we experience it and ideally nothing more. Now, let’s assume that the mind is a product of emergent processes appearing in the structural interactions between the brain and the environment by means of sensory–motor coupling of a whole, embodied agent through behavior, wherein the mind is considered a nontrivial phenomenon appearing as a result of such interactions. This assumption refers to the embodied mind, or embodied cognition (Varela et al., 1991). Many phenomena emergent from embodied cognition can be efficiently described in the language of dynamical systems, as we will see. Subsections of the current chapter will explore the idea of embodied cognition by visiting different approaches taken so far. These include psychological studies focusing on embodiment and “new-trend” artificial intelligence robotics studies exemplifying behavior-based robotics involving the synthesis of embodied cognition. Readers will see that some psychological views, especially Gibsonian and Neo-Gibsonian approaches have been well incorporated into dynamical system theories, and have thus provided useful insights guiding behavior-based robots and neurorobots. After this review, we will consider particular neural network models as
Dynamical Systems Approach for Embodied Cognition
83
abstractions of brains, and then consider a set of neurorobotics studies by using those models that demonstrate emergence through synthesis by capturing some of the essence of embodied cognition. First, however, the next section presents an introduction to dynamical systems’ theories that lay the groundwork for the synthetic modeling studies to follow. But, readers should note that this is not the end of the story: Chapter 6 discusses some of the crucial ingredients for synthesizing the “mind” that have been missed in conventional studies on neural network modeling and behavior-based robotics. The first section provides an introductory tutorial on general ideas of dynamical systems.
5.1. Dynamical Systems Here, I would like to star t with a very intuitive explanation. Let’s assume that there is a dynamical system, and suppose that this system can be described at any time as exhibiting an N di mensional system state where the ith dimensional value of the current state is given as x ti . When x ti +1 as the ith dimensional value of the state at next ti me step, and can be determined solely by way of all dimensional values at the current time step, the time development of the dimensions in the system can be described by the following difference equation (also called a “map”):
x1t +1 = g1tt( x1,t x 2 ,…, x N ) 2 xt +1 = g 2 ( x1t , xt2 ,… , xtN ) x N = g N x1, x 2 ,…, x N (t t t ) t +1
(Eq. 1)
Here, the time development of the system state is obtained by iterating the mapping of the current state at t to the next state at t+1 starting from given initial state. Eq. 1 can be rewritten with N dimensional state vector X t , and with P as a set of parameters of interest that characterize the function G( ):
X t +1 = G(X t , P )
(Eq. 2)
A given dynamical system is often investigated by examining changes in time-development trajectories versus changes in the representative
84
On the Mind
parameter set P. If the function G( ) in Eq. 2 is given as a nonlinear function, the trajectories of time development can become complex depending on the nonlinearity. In most cases, the time development of the state cannot be obtained analytically. It can be obtained only through numerical computation as integration over time from a given initial state X0 and this computation can only be executed with the use of modern digital computers. Dynamical systems can be described also with an ordinary differential equation in continuous time with X as a vector of system st ate, with X as a vector of the time derivative of the state (it can be also written as ∂X ), and with F( ) as a nonlinear dynamic function parameterized by ∂t P as shown in Eq.3.
X = F ( X , P )
(Eq. 3)
The exact trajectory in continuous time can be obtained also by integrating the time derivative from a given dynamical state at the initial t ime. The structure of a particular dynamical system is characterized by the configuration of attractors in the system, which determines t he time evolution profiles of different states. Attractors are basins toward which trajectories of dynamical states converg e. An attractor is called an invariant set because, af ter trajectories conv erge (perhaps after infin ite time), they become invariant trajectories. That is, they are no longer variable and are ins tead determined, representing st able state behaviors characterizing t he system. On the other hand, outside of attractors or invariant sets are transient states wherein trajectories are variable. Attractors can be roughly categorized in four types as shown in Figure 5.1a – d. The easiest attractor to envision is a fixed point attractor in which all dynamic states converge to a point (Figure 5.1 a). The second one is a limit cycle attractor (Figure 5.1b). In this type of attractor, the trajectory converges to a cyclic oscillation pattern with constant periodicity. The third one is a limit torus that appears when there is more than one frequency involved in the periodic trajectory of the system and two of these frequencies form an irrational fraction. In this case, the trajectory is no longer closed and it exhibits quasi-periodicity (Figure 5.1c). The fourth one is a chaotic attractor (a “strange attractor”) in which the trajectory exhibits infinite periodicity and thereby forms fractal structures (Figure 5.1d ). Finally, in some cases multiple local attractors can coexist in the same state space as illustrated in Figure 5.1e. In such cases, the
Dynamical Systems Approach for Embodied Cognition (a)
(b)
85
(c)
x P2 P (d)
(e)
x
Figure 5.1. Different types of attractors. (a) Fixed point attractor, (b) limit cycle attractor, (c) limit torus characterized by two periodicities P 1 and P2 which form an irrational fraction, and (d) chaotic attractor. (e) Shows multiple attractors consist ing of a fixed point attractor and a limit cycle attractor. Note that all four types of attractors are il lustrated in terms of continuous time dynamical s ystems.
attractor to which the system converges depends on the initial state. In Figure 5.1e a state trajectory starting from the left side and the right side of the dotted curb will converge to a fixed point and a limit cycle, respectively. Next, we look at the case of discrete time dynamics in detail. 5.1.1 Discrete Time System Let us examine the so- called logistic map , which was introduced by Robert May (1976), as a simple illustrative example of Eq. 1 with a onedimensional dynamic state. Even with a one-dimensional dynamic state, its behavior is nontrivial as will be seen in the following. The logistic map is written in discrete- time form as:
x t +1 = a x t (1 − x t )
(Eq. 4)
Here, x t is a one- dimensional dynamic state and a is a parameter. If a particular value is taken for the initial state, x 0 , it will recursively generate a trajectory x1, x2 , … ., x n as shown in the diagram at the left of Figure 5.2 a.
On the Mind
86 (a)
.0
xt+
0.8
xt+=xt
0.6 x 0.4 0.2 x0 x3
x
0.0
x2 x t
2.4
(b) .0
a=2.6
0
20
2.8
3.0
a=3.2
x
0.30
2.6
30
t
.0
x
x
0
3.4 a
3.6
3.8
4.0
40
50
a=3.6
.0
0.3 0
3.2
20
30
t
0.3 0
0
20
30
t
Figure 5.2. A logistic map. (a) Dynamic iteration corresponding to a logistic map is shown on the left and its bifurcation diagram w ith respect to the parameter a is shown on the right. (b) Time developments of the state with different values of a where a fixed point attractor, limit cycle attractor, and chaotic attractor appear from left to right for a = 2.6, 3.2, and 3.6, respectively.
Now, let’s examine how the dynamical structure of a logistic map changes when the parameter a is varied continuously. For this purpose, a bifurcation diagram of the logistic map is shown in Figure 5.2 a, right. This diagram shows an invariant set of attractors for each value of a, where an invariant set means a set of points within the convergence trajectory as mentioned previously. For example, when a is set to 2.6, the trajectory of x t converges toward a point around 0.61 from any initial state, and therefore this point is a fixed- point attractor (see Figure 5.2b left.) When a is increased to 3.0, the fixed-point attractor bifurcates i nto a limit-cycle attractor with a period of 2. With a set to 3.2, a limit cycle alternating between 0.52 and 0.80 appears (see Figure 5.2 b middle.),
Dynamical Systems Approach for Embodied Cognition
87
and when a is further increased to 3.43, the limit cycle with a period of 2 bifurcates into one with a period of 4. A limit cycle alternating sequentially between 0.38, 0.82, 0.51, and 0.88 appears when a is set to 3.5, whereas when a is increased to 3.60, fur ther bifurcation takes place from a limit cycle to a chaotic attractor characterized by an invariant set with an infinite number of points (see Figure 5.2 b right.) The time evolutions of x star ting from different initial st ates are plotted for these values of a, where it is clear that the transient dynamics of the trajectory of x converge toward those fixed-point, limit- cycle, and chaotic attractors. It should be noted that no periodicity is seen in the case of chaos. We’ll turn now to look briefly at a number of characteristics of chaos. One of the essential characteristics of chaos is its sensit ivity with respect to initial conditions. In chaos, when two trajectories are generated from two initial states separated by a negligibly small dist ance in phase space, the distance between these two trajectories increases exponentially as iterations progress. Figure 5.3 a shows an exa mple of such development. This sensitivity to initial conditions determines the ability of chaos to generate nonrepeatable behaviors even when a negligibly small perturbation is applied to the initial conditions. This peculiarity of chaos can be explained by the process of stretching and folding in phase space as illustrated in Figure 5.3b. If a is set to 4.0, the logistic map generates chaos that covers the range of x from 0.0 to 1.0 as can be seen in Figure 5.2a. In this case, the range of values for x 0 between 0.0 and 0.5 is mapped to x1 values between 0.0 and 1.0 with magnification, whereas x 0 values between 0.5 and 1.0 are mapped to x1 values between 1.0 and 0.0 (again with magnification, but in the opposite direction), as can be seen in Figure 5.3b. This essentially represents the process of stretching and folding in a single mapping step of the logistic map. Two adjacent initial states denoted by a dot and a cross are mapped to two points that are slightly further apart from each other after the first mapping. When this mapping is repeated n times, the distance between the two states increases exponentially, resulting in the complex geometry generated for x n by means of iterated stretching and folding. This iterated stretching and folding is considered to be a general mechanism for generating chaos. Further, look at an interesting relation between chaotic dynamics and symbolic processes. If we observe the output sequence of the logistic map and label it with two symbols, “H” for values greater than 0.5 and “L” for those less than or equal to 0.5, we get probabilistic sequences of alternating “H” a nd “L.” When the parameter a is set at 4.0, it is known
On the Mind
88 (a)
0.9 0.8 0.7
x 0.6 0.5 0.4
5
(b)
x0
0
5
x
20
25
x2
30 t
35
x3
40
45
50
xn
.0
st
2nd
3rd
0.5
0.0
Figure 5.3. Initial sensitivity of chaoticmechanisms. (a) Distance between two trajectories (represented bysolid and dashed lines) starting from their initial states separated by a distance ofϵ in phase space. Distance between the two grows exponentially as time goes by in chaos generated by a logistic map with a set to 3.6. (b) The mechanism of generating chaos is bystretching and folding.
that the logistic map generates “H” or “L” with equal probability with no memory, like a coin flip. This can be represented by a one-state probabilistic fin ite state machine (FSM ) with an equal probab ility output for “H” and “L” from this single state. If the parameter a is changed to a different value in the chaotic region, a different form of a probabilistic FSM with a d ifferent number of discrete states and dif ferent probability assignments for output labels is reconstructed for each. This is called symbolic dynamics (Crutchfield & Young, 1989; Devaney 1989), which provides a theorem to connect real number dynamical systems and dis crete symbol systems.
Dynamical Systems Approach for Embodied Cognition
89
Tangency
Xt+
Xt
Figure 5.4. Tangency in nonlinear mapping. The passing through of the state x slows down in the vicinity of t he tangency point.
One interesting observation of logistic maps in terms of symbolic dynamics is that the complexity of symbolic dynamics in terms of the number of states in the reconstructed probabilistic FSM can be infinite especially in the parameter region at the onset of chaos, at the ends of window regions in which the periodicity of the attractor moves from finite to infinite (Crutchfield & Young, 1989). It is known that nonlinear dynamic systems in general develop critical behaviors upon exhibiting state trajectories of infinite complexity, at the “edge of chaos,” including at the ends of window parameter regions where quite rich dynamic patterns following power law can be observed. Edge of chaos can be observed also under another critical condition when “tangency” exists in mapping of function, as shown in Figure 5.4. When the curve of mapping function becomes tangent to the line of identity mapping, passing through the tangent point could take infinite steps depending on the value of x to enter the passing through. T his generates the phenomena known as intermitte nt chaos in which the passing through appears intermittently, only after several steps, or sometime after infinite steps. These properties of edge of chaos in critical conditions are revisited in later chapters as we examine the behavioral characteris tics of neurorobots observed in our experiments.
90
On the Mind
5.1.2 Continuous-Time Systems Next, let’s examine the case of continuous time, represented by Eq. 5. We’ll take the Rössler system (Rössler, 1976) as a simple example that can be described by t he following set of ordinary differential equations:
x = − y − z y = x + ay
(Eq. 5)
z = b + z(x − c ) This continuous-time nonlinear dynamical system is defi ned by a threedimensional state ( x, y, and z), three parameters ( a, b, and c), and no inputs. If we conduct a phase space analysis on this system, we can see different dynamical structures appearing for different parameter settings (of a, b, and c). As shown in Figure 5.5, continuous trajectories of the dynamical state projected in the two- dimensional space (x, y) converge toward three different types of attractors (fixed point, limit cycle, or chaotic) depending on the values of the parameters. It should be noted that in each case the trajectory converges to the same attractor regardless of the initial st ate. Such an attractor is called a global attractor, and the chaotic attractor shown in (c) is the Rössler attractor. The phenomena corresponding to these changes in the dy namical struct ure caused by parameter bifurcation are quite similar to those observed in the case of the logistic map. The mechanism of generating chaos with the Rössler attractor can be e xplained by the process of stretching a nd folding previously mentioned. In the Rössler attractor, a bundle of trajectories constituting a sheet rotates in a counterclockwise direction, accompanied by a one-time folding and stretching. If we take a section of the sheet, which is known as a Poincaré section (Figure 5.5 d ), we’ll see a line segment consisting of an i nfinite number of trajectory points. This segment again is folded once (see during a single whichline is mapped ontoand thestretched line segment Figure 5.5 erotation, ). If this process is iterated, the sensitivity of this system to initial conditions becomes apparent in the same way as with the logistic map. 5.1.3 Structural Stability This subsection explains why structural stability is an important characteristic of nonlinear dynamical sys tems. Importantly, I will argue t hat one emergent property of nonlinear dynamical systems is the appearance
Dynamical Systems Approach for Embodied Cognition (a)
(b)
(c)
(d)
91
Poincare section
(e)
Fold and stretch
Figure 5.5. Different attractors appearing in the Rössler system. (a) A fixedpoint attractor (a = −0.2, b = 0.2, c = 5.7), (b) a limit-cycle attractor (a = 0.1, b = 0.1, c = 4.0), and (c) a chaotic attractor (a = 0.2, b = 0.2, c = 5.7). Illustrations of (d) the Poincaré section and (e) the process of folding and stretching in the Rössler attractor that accounts for mechanism of generating chaos.
of a particular attractor configuration for any given dynamical system. A particular equation describing a dynamical system can indicate the direction of change of state at each local point in terms of a vector field. However, the vector field itself cannot tell us what the attractor looks like. The attractor emerges only after a certain number of iterations have been performed, through the transient process of converging toward the attractor. An important point here is that attractors as trajectories of steady states cannot exist by themselves in isolation. Rather, they need to be “supported” by transient parts of the vector that converge toward these attractors. In other words, transien t part s of the vector flow make attractors stable, as illustrated in Figu re 5.6 a.
92
On the Mind
(a) 3
(b) 3
2
2
V 0
V 0
–
–
–2
–2
–3
–3
–2
–
0 X
2
3
–3
–3
–2
–
0 X
2
3
Figure 5.6. Vector flow. (a) Appearance of a limit-cycle attractor in a vector field of a particular two- dimensional continuous dynamical sys tem with the system state ( x, v) in which the vector flow converges toward a cyclic trajectory. (b) A vector field for a harmonic oscillator in which its flow is not convergent but forms concentric circles.
This is the notion behind the structural stability of attractors. To provide a more intuitive explanation of this concept, let’s take a counterexample in terms of a system that is not struct urally stable. Sometimes I ask students to give me an example of a system that generates oscillation patterns and a common answer is a sinusoidal function or a harmonic oscillator, such as the frictionless spring-mass system described by Eq. 6. mv = − k x
(Eq. 6)
x = v Here, x is the one- dimensional position of a mass m, v is its velocity, and k is the spring coefficient. The equation represents a second order dynamic system without damping terms. A frictionless spring- mass system can indeed generate sinusoidal oscillation patterns. However, such patterns are not structurally stable because if we apply force to the mass of the oscillator instantaneously, the amplitude of oscillation will change immediately, and the srcinal oscillation pattern will never be recovered automatically (again, it is frictionless). If the vector field is plotted in (x, v) space, we will see that the vector flow describes concentric circles where there is no convergent flow that constitute a limit-cycle attractor (see Figure 5.6 b). Indeed, a sinusoidal wave function is also simply the trace of one point on a circle as it rolls along a plane. Most rhythmic patterns i n biological systems are thought to be generated by limit- cycle attractors because o f their potential stability against
Dynamical Systems Approach for Embodied Cognition
93
perturbations. These include central pattern generators in neural ci rcuits for the heart beat, locomotion, breathing, swimming, and many others, as is described briefly in the next sect ion. Such limit- cycle attractor dynamics in real physical systems are generated by nonlinear dynamical systems called dissipative systems. A dissipative system consists of an energy dissipation part and an energy supply part. If the amounts of energy dissipation and energy supply during one cycle of oscillation are balanced, this results in the formation of an att ractor of the limit cycle type (or it could also result in the generation of chaos under certain conditions). Energy can be dissipated by dampening caused by friction in mechanical systems or by electric resista nce in electrical circuits. When a larger or smaller amount of energy is supplied momentarily due to a perturbation from an external source, the state trajectory deviates and becomes transient. However, it returns to the srcinal attractor region by means of automatic compensation by dissipating an appropriate amount of energy corresponding to the input energy. On the other hand, a harmonic oscillator without a dampening term, such as that shown in Eq. 6, is not a dissipative system but an energy conservation system. There is no dampening term to dissipate energy from the system. Once perturbed, its st ate trajectory will not return to the srcinal one. In short, the structu ral stability of dynamic patterns in terms of physical movements or neural activity in biological systems can be achieved through attractor dynamics by means of a dissipative structure. Furt her, the particular att ractors appearing in dif ferent cases are the products of emergent properties of such nonlinear (dissipative) dynamic systems. Indeed, Neo- Gibsonian psychologists have taken advantage of these interesting dy namical properties of dissipative systems to account for the generation of stable but flexible biological movements. The next section explores such concepts by introducing the Gibsonian approach first, followed by Neo- Gibsonian variants and infant developmental psychology usi ng the dynamical sys tems’ perspectives.
5.2. Gibsonian and Neo-Gibsonian Approaches 5.2.1 The Gibsonian Approach A concept central to this approach, known as affordance , has significantly influenced not only mainstream psychology and philosophy of
94
On the Mind
the mind, but also synthetic modeling studies including artificial intelligence and robotics. In the srcinal theory of affordance proposed by J. J. Gibson (1979), affordance was defined as “all possibilities for actions latent in the environment.” Put another way, affordance can be understood as behavioral relations that animals are able to acquire in interaction with t heir environments. Relatio nships between actors and objects within t hese environments af ford these agents opportunities to generate adequate behaviors. For example, a chair affords sitting on it, and a door knob affords pulling or pushing a door open or closed free from the resistance afforded by the door's locking mechanism. Many of Gibson's considerations focused on the fact that essential information about the environment comes by way of human processing of the optical flow. Optical flow is the pattern of motion sensed by the eye of an observer. By considering that optical flow information can be used to perceive one's own motion pattern and to control one's own behavior, Gibson came up with the notion of affordance constancy. He illustrated this concept with the example of a pilot flying toward a target on the ground, adjusting the direction of flight so that the focus of expansion (FOE) in the visual optical flow becomes superimposed on the target (see Figure 5.7a). This account was inspired by his own experience in training pilots to develop better landing skills during World War II. A similar example, closer to everyday life, is that we walk along a corridor while recognizing the difference from zero of the optical flow vectors along both sides of the corridor, which allows us to walk down the middle of the corridor without colliding with the walls (see Figure 5.7b). These examples suggest that for each behavior there is a crucial perceptual variable—in Gibson’s two examples, the distance between the FOE and target, and the vector difference between the optical flows for both walls—and that body movements are generated to keep these perceptual variables at constant values. By assuming the existence of coupled dynamics between the environment and small controllers inside the brain, the role of the controllers is to preserve perceptual constancy. A simple dynamical system theory can show how this constancy may be maintained by assuming the existence of a fixed point attractor, which ensures that perceptual variables always converge to a constant state. Andy Clark, a philosopher in Edinburgh, has been interested in the role of embodiment in generating situated behaviors from the Gibsonian perspective. He analyzed how an outfielder positions himself to catch a fly ball as an example (Clark, 1999). In general, this action is thought
Dynamical Systems Approach for Embodied Cognition (a)
95
(b)
Figure 5.7. Gibson’s notion of optical constancy. (a) Flying while superimposing the focus of expansion on the target heading and (b) walking along a corridor while balancing optical flow vectors against both side walls. Redrawn from Gibson (1979).
to require complicated calculations of variables such as the arc, speed, acceleration, and distance of the ball. However, there is actually a simple strategy to catch it: If the outfielder continues to adjust his movement so that the ball appears to app roach in a straight line in his visual field, the ball falls down to him eventually. By maintaining this coordination for perceptual constancy, he can catch the fly ball easily. Clark explains that the task is to maintain, by making multiple, ongoing, real-time adjustments to the ru nning motion, a kind of coordination between the inner and the outer. This means that coordination dynamics like thi s naturally appears under relatively simple principles, such as perceptual constancy, instead of through complicated computation involving representation in an objective, simulated Cartesian coordinate sys tem. 5.2.2 Neo-Gibsonian Approaches In the 1980s, so- called Neo- Gibsonian psychologists such as Turvey, Kugler, and Kelso started investigating how to achieve the coordination of many degrees of freedom by applying the ideas of dissipative structures from nonlinear dynamics to psychological observations of human and animal behavior (see the seminal book by Scott Kelso, 1995). They considered that t he ideas of dissipative structures, especially concerning limit cycle attractor dynamics, can serve as a basic principle in organizing coherent rhythmic movement patterns such as walk ing, swi mming, breathing, and hand waving, as described briefly in the previous section. The important theoretical ingredients of these ideas are entrainment and phase transitions. First, coupled oscillators that initially oscillate with
On the Mind
96
different phases and periodicities can, by mutual entrainment under certain conditions, converge to a global synchrony with reduced dimensionality. Second, the characteristics of this global synchrony can be drastically changed by a shift of an order parameter of the dynamic system by means of phase transition. Let’s look at this in more detail by reviewing a representative experimental study conducted by Kelso and colleagues (Schoner & Kelso, 1988). In the experiment, subjects were asked to wiggle the index fi ngers of their left and right hands in the same direction (different muscles activated; antiphase) in synchrony with a metronome. When the metronome was speeded up gradually, what happened was that the finger movement pattern suddenly switched from the same direction to the opposite direction one (same muscles activated; in-phase). It was observed that the relative phase changed 180 degrees to 0 degrees suddenly (see the left-hand side panel in Figure 5.8).
Right index finger Left index finger y g r e n e
p m a
time
y c n e u q re F
80°
0° Phase difference
80°
0° Phase difference
80°
0° Phase difference
y g r e n e
p m a
time
y g r e n e
p m a
time
Figure 5.8. The phase transition model b y Kelso (1995) for explai ning the dynamic shifts seen in bimanual finger movements. The panel on the left- hand side shows how oscillation coordination between right and left index fingers changes when the leading frequency is increased. The panel on the right- hand side shows the corresponding change in the energy landscape.
Dynamical Systems Approach for Embodied Cognition
97
After this experiment, Kelso and colleagues showed by computer simulation that the observed dynamic shift is due to the phase transition from a particular dynamic structure self-organizing to another, given changes in an order parameter of the system (the speed of metronome in this example). When a hypothetical energy landscape is computed for the movement patterns along with the order parameter of the metronome speed (see the right-hand side panel in Figure 5.8), the antiphase becomes stable with its energy minimum state when the metronome speed is low. However, the antiphase becomes unstable as the metronome speed increases (the parameter introduces too much energy into the system) and the behavior is modulated toward the realization of a more stable system and corresponding energetic minimum, switching the system state suddenly from the anti-phase to the in-phase. Such dramatic shifts in dynamic system state such as those seen in the bimanual finger movement illustration can be explained by means of the phenomena of phase transition. Indeed, a diverse range of phenomena characterized by similar shifts in animal and human movement patterns appear very effectively explained in terms of phase transitions. Good examples include the dynamic shift from trot to gallop in horse locomotion given a change in the system parameter “running speed,” as well as the shift from walk to run in human locomotion. It is common experience that the middle state, a walkrun, is more difficult to maintain (at least without lots of practice) than one or the other behaviors. This result accords with a central notion in NeoGibsonian approaches, that behaviors are organized not by an explicit central commander top-down, but by implicit synergy among local elements including neurons, muscles, and skeletal mechanics, and that these behaviors represent emergent characteristics of dissipative structures. 5.2.3 Infant Developmental Psychology Neo-Gibsonian theories help ed to give birt h to another dynamic system theory that accounts for infant development. Ester Thelen and Linda B. Smith wrote in their seminal te xtbook, A Dynamic Systems Approach to the Development of Cognition and Act ion, that: We invoke Gibson’s beliefs that the world contains information and that the goal of development is to discover relevant information in order to make a functional match between what the environment affords and what the actor can and wants to do. (Thelen & Smith, 1994, p. 9, Introduction)
98
On the Mind
They suggest that development is better understood as the emergent product of many decentralized and local interactions occurring in real time between parts of the brain, the body, and the environment, rather than as sequences of events preprogrammed in our genes. For example, crawling is a stable behavior for infants for several months. However, when they newly acquire the movement patterns of walking upright, the movement patterns of crawling become unstable. Smith and Thelen hold that this happens not as the result of a genome preprogram but as the result of an efficient solution generated through self- organization (Smith & Thelen, 2003). Following this line of thinking, Gershkoff-Stowe and Thelen (2004) provide a remarkable account of so-called “U-shaped” development, a phenomenon whereby previously performed behaviors regress or disappear only to recover or reappear with even better performance later on. A typical example can be seen in language development around 2 or 3 years of age when children, after several months of correct usage, o ften incorrectly use words like “foots” and “goed,” a phenomenon known as overregularization. They eventually resume using these words correctly. Another example is the walking reflex. When a newborn baby is held so that the feet lightly touch a solid surface, she or he shows walking-like motion with alternate stepping. However, this reflexive behavior is scarcely observed after a few months and does not reappear until justprior to walking. One more example is perseverative reaching observed in the socalled A-not-B task as srcinally demonstrated by Jean Pia get, known as the father of developmental psychology, illustrated in Figure 5.9. In this task, 8- to 10- month-old infants are cued to recover a hidden object from one of two identical hiding places (see Figure 5.9). Recovery is repeated several times at the first location “A” before the experimenter switches the hiding place to the second location “B.” Although the infant watches the toy hidden at the new location “B,” if there is a delay between hiding and allowing the child to reach, infants robustly return to the srcinal location “A.” This is known as perseverative reaching. This reaching can even be observed in the not-hidden toy case (i.e., when provided with an explicit cue to indicate the correct location). An interesting observation in the not- hidden condition is that infants around 5 months old are correct (around 70% success rate) at location “B” and show less perseveration than infants around 8 months old who are incorrect (around 20% success rate). This perseverative behavior is not observed in infants older than 12 months of age.
Dynamical Systems Approach for Embodied Cognition
2
B
4
99
3
A
5
Figure 5.9. Piaget’s A-not-B task. First, i n 1, an attractive obj ect is h idden at location “A” (left-hand side). The infant then repeatedly retrieves the object, in 2 and 3, from the correct location of “A.” In 4, the object is then hidden at location “B” (right-hand side) while the infant attends to this. However, with a delay between seeing the hiding and retrieval, the infant fails to retr ieve the object at the correct location “B.”
What is the underlying mechanism in these examples of U-shaped development? Gershkoff- Stowe and Thelen (2004) argue that U-shaped development is not caused by regression or loss of a single element such as one in the motor, perceptual, or memory system alone. Instead, U- shaped behavior is the result of a continuously changing configuration between mutually interacting components, including both mental and behavioral components. They write, “The issue i s not how a beha vior is ‘ lost’ or ‘gets worse,’ but how the component processes can reorganize to produce such dramatic nonlinearities in performance” (Gershkoff- Stowe & Thelen, 2004, p. 16). In the case of perseverative reaching, although it can be considered that repeated recoveries from location “A” can reinforce a memory bias to select location “A” again upon next reaching, this is not the only cause. It was found that the hand trajector ies in repeated recoveries of 8- month- old infants become increasingly similar to those of 5- month- old infants who are relatively immature in controlling their hand reaching movements. It was also found that changing the hand trajectory by adding weights to the infants’ arms significantly
100
On the Mind
decreased the perseveration. The point here is that the mutual reinforcement of the memory bias and the persistent trajectories in the reaching movement through the repeated recoveries result in forming a strong habit of reliable perseverat ive reaching. Thi s account has been supported by simulation studies using the dynamic neural field model (Schoner & Thelen, 2006). This perseverative reaching is at its peak at 8 months of age and starts to drop off t hereafter as other functions mature to counter it, such as attention switch and attention maintenance that allow for tracking and preserving the alternative cue appearing in the second location “B.” Smith and Thelen (2003) explain that infants who have had more experience exploring environments by self- locomotion sh ow greater visu al attent ion to the desired object and its hidden location. This account of how reaching for either “A” or “B” is determined by infants is parallel to what Spivey (2007) has discussed in terms of the “continuity of minds.” He considers that even discrete decisions for selecting actions might be delivered th rough the process of gradually settling partially active and competing neural activities involved with multiple psychological processes. And again, the emergence of U-shape development is a product of dynamic interactions between multiple contingent processes both internal and external to infants (Gershkoff- Stowe & Thelen, 2004). The next subsec tion looks at the development of a cognitive competency, namely imitation, which has been considered to play an important role in the cognitive development of children. 5.2.4 Imitation It has been considered that imitation and observational learning are essential for children to acquire a wide range of behaviors because learning by imitation is much mor e efficient than learning through tr ial and error by each individual alone. Jean Piaget proposed that imitation in infants develops through six discrete stages until 18 to 24 months of age (Piaget, 1962). The first stage start s with the sensory- reflex response of newborns, which is followed by repletion of some repertories by chance in the second stage. A drastic differentiation in development comes with deferred imitation at around 8 to 12 months in the fourthth stage. Here, an ability to reproduce a modeled activity that has been observed at some point in the past emerges. Piaget emphasized this change by
Dynamical Systems Approach for Embodied Cognition
101
suggesting that this stage marks the onset of mentalization capabilities in infants. This mentalization capability is further developed in the sixth stage at around 18 to 24 months, when some symbolic level mental representation and manipulation can be observed. A typical example is the appearance of pretend play. For example, a child pretends to call by using a banana instead of a real phone after observing the actions of his or her parents. Although Piaget's emphasis was cognitive development toward mentalization or symbolism that appears in the later stages, some recent studies have pursued the suspicion that the roots of the human cognition may be found in the analysis of early imitation, and so have focused on how neuronal mechanisms of imitation appear in much earlier stages. A seminal study by Meltzoff and Moore (1977) showed that human neonates can imitate facial gestures of adults such as tongue protrusion, mouth opening, and lip protrusion. Thi s finding was nontrivial because it implies that neonates can match their own unseen behaviors with those demonstrated by others. Even Piaget believed that facial imitation could appear only after 8 months of age. Although the exact mechanisms enabling these imitative behaviors in neonates is still a matter of debate, Meltzoff (20 05) has hypothesized a “like me” mechanism that conn ects t he perceptions of others “l ike me” with one's own capacities, therefore grounding an embodied understanding of others’ minds enactive imitation. In the first stage, in newborns, innate sensory– motor mapping can generate aforementioned imitative behaviors by means of automatic responses. In the second stage, infants experience regular relationship s between t heir mental states and actions generated repeatedly, and thus associations between them are learned. Finally in the third stage, infants come to understand that others who act “like me” have mental states “like me.” On a simi lar line, Jacqueline Nad el (20 02) proposed that imitation is a means to communicate with others. Nadel observed a group of preverbal infants in a natural social play setting involving a type of frequently observed communicative interaction, turn taking or switching roles among two or three infants. Typical turn taking was observed when an infant showed another infant an object similar to the one he or she was holding. In most cases, the partner infant took the object and imitated its usage. Sometimes, however, the partner refused to do so or ignored the other. In t hese cases, the initiator left the object and turned to i mitate the partner's ongoing behavior.
102
On the Mind
Another remarkable finding by Nadel (2002) was that pairs of preverbal infants often exhibited imitation of instrumental activity with synchrony between them. Figure 5.10 shows that when one infant demonstrated an unexpected use of objects (carrying an upside- down chair on his head), the partner imitated this i nstru mental activity during their imitative exchanges. Based on these observations and others, Nadel and colleagues argue that although immediate imitation generated during behavioral exchanges may not be always an intelligent process as Piaget pointed out, infants at the very least “know how” to communicate with each other (Andry et al., 2001). This intriguing communicative activity may not require much of mental representation and manipulation or symbolism, but rather depends on synchronization and rhythm which appear spontaneously in the dynamical processes of sensory– motor mapping between the perception of others of “like me” and one's own actions. The next section describes a new movement in artificial intelligence and robotics guided by these insights and many others from contemporary developmental psychology.
Figure 5.10. Preverbal infants exhibit instr umental activities with synchrony during imitative exchange. Reproduced from (Nadel, 2002) with permission.
Dynamical Systems Approach for Embodied Cognition
103
5.3. Behavior-Based Robotics At the end of the 1980s, a paradigm shift occurred in artificial intelligence and robotics research. This shift occurred with the introduction of behavior-based robotics by Rodney Brooks at MIT. It should be noted, however, that just a few years before Brooks started his project, Valentino Braitenberg, a German neuroanatomist, published a book entitled Vehicles: Experiments in Synthetic Psychology (Braitenberg, 1984) describing the psychological perspective that led to the behaviorbased robotics approach. The uniqueness of the book is its attempt to explore possible brain-psychological mechanisms for generating behavior via synthesis. For example, Braitenberg’s “Law of uphill analysis and downhill invention” suggests that it is more difficult to understand a working mechanism or system just f rom looking at it exte rnally than it is to create it from scratch, an insight parallel to the quote from Feynman introducing this chapter. Another interesting feature of Braitenberg’s book is that all of the synthesis described is done through thought experiments rather than by using real robots or computer simulations—although many researchers reconstructed these experiments using actual robots years later. Braitenberg’s thought experiments are simple, yet provide readers with valuable clues about the cognitive organization underlying adaptive behaviors. Some representative examples of his thought experiments are introduced as follows, because they offer a good introduction to understanding t he behavior-based approach.
5.3.1 Braitenberg’s Vehicle Thought Experiments In his book, Braitenberg introduces thought experiments concerning 14 different2,types of 4vehicles. Here, we examples. confine ourselves at Vehicles 3, and as representative Each of to thelooking three vehicles is equipped with a set of paired sensors on the front left- and right-hand sides of its body. The sensory inputs are transmitted to the left and right wheel drive motors at the rear through connecting lines which are analogous to synaptic connections. Let’s begin with Vehicle 2a shown in Figure 5.11. The vehicle has light intensity sensors to the front on each side that are connected to its corresponding rear motors in an excitatory manner
104
On the Mind
+
+
Vehicle 2a
+
+
Vehicle 2b
–
–
– –
Vehicle 3b
Vehicle 3a
Figure 5.11. Braitenberg vehicles 2a and 2b (top) and 3a and 3b ( bottom).
(same-side excitatory connectivity). If a light source is located directly ahead of the vehicle, it will crash into the light source by accelerating the motors on both sides equally. However, if there is a slight deviation toward the light source, the deviation will be increased by accelerating the motor on the side closer to the light source. This eventually generates radical avoidance of the light source. On the other hand, if each sensor is connected to a motor on the opposite side (cross-excitatory connectivity), as shown for Vehicle 2b in Figure 5.11, the vehicle always crashes into the light source. This is because the motor on the opposite side of the light source accelerates more and thus the vehicle moves toward the light source. Vehicles 2a and 2b are named Coward and Aggressive, respectively. Now, let’s suppose that the connectivity lines, rather than being excitatory as for Vehicle 2, are inhibitory for Vehicle 3 (Figure 5.11). Now Vehicle 3 has drastically different behavior characteristics from Vehicle 2. First, let’s look at Vehicle 3a which has same-side inhibitory connectivity. This vehicle slows down in the vicinity of the light source. It is gradually attracted to the light source and finally stops close enough (perhaps depending on friction of the wheels and other factors). If the vehicle deviates slightly to one side from the source, the motor on the opposite side slows down because it is inhibited by the sensor that perceives a stronger stimulus from the source. If it deviates to the right, then the left wheel is inhibited, and vice versa.
Dynamical Systems Approach for Embodied Cognition
105
(b)
(a)
V
I
Figure 5.12. Braitenberg vehicle 4. (a) Nonlinear maps from sensory intensity to motor velocity assumed for this vehicle and (b) complex behaviors that emerge on more complex maps.
Eventually, the vehicle shifts back toward the source and finally stops to stay in t he vicinity of t he source. In the case of V ehicle 3b, which has cross- inhibitory connectivity, although this vehicle also slows down in the presence of a strong light stimulus, it gently turns away from the source, employing the opposite control logic of Vehicle 3a. The vehicle heads for another light source. Vehicles 3a and 3b are named Lover and Explorer, respectively. Vehicle 4 is added with a trick in the connectivity lines: The relationship between the sensory stimulus and the motor outputs is changed from a monotonic one to a non-monotonic one, as shown in Figure 5.12a. Because of the potential nonlinearity in the sensor y– motor response, the vehicle will not just be monotonically approaching the light sources or escaping from them. It can happen that the vehicle approaches a source but changes course to deviate away from it when coming within a certain distance of it. Braitenberg imagined that repetitions of this sort of approaching and moving away from light sources can result in the emergence of complex trajectories, as illustrated in Figure 5.12b. Simply by adding some nonlinearity to the sensory– motor mapping functions of the simple controllers, the resultant interactions between the vehicle and the environment (light sources) can become significantly complex. These are very interesti ng results. However, being t hought experi ments, this approach is quite limited. Should we wish to consider emergent behaviors beyond the limits of such thought experiments, we require computer simulations or real robotics experiments.
106
On the Mind
5.3.2 Behavior-Based Robots and T heir Limitations Returning now to behavior-based robotics, Brooks elaborated on thoughts similar to Braitenberg’s by demonstrating that even small and extremely simple insect- like robots could exhibit far more complex, realistic, and intelligent behaviors than the conventional computationally heavy robots used i n traditional AI research. This marked the beginning based asrobotics research. papers published of bybehaviorBrooks, such “Elephants don'tArgumentative play chess” (Brooks, 1990) and “Intelligence without representation” (Brooks, 1991), present his thoughts on what he calls “classical AI” and “nouvelle AI.” He has criticized the use of large robots programmed with classical AI schemes, arguing that a lot of the computation time is spent on logical inference or the preparation of action plans in real-world tests even before taking a single step or indeed making any movement at all. On the other hand, small robots whose behavior is based on the philosophy of nouvelle AI are designed to move first, taking part in physical interactions with their environment and with humans while computing all the necessary parameters in real t ime in an event- based manner. Brooks also criticizes the tendency of classical AI to be overwhelmed with “representation.” For example, typical mobile robots based on the classical AI scheme are equipped with global maps or environment models represented in a three- dimensional Cartesian coordinate system. The robots then proceed to match what they have sensed through devices such as vision cameras with the stored representation through complicated coordinate transformations for each step of their movement as they find their location in the stored Cartesian coordinate system. The behavior-based robots made by Brooks and his students use only a simple scheme based on the perception-to-motor cycle, in which the motor outputs are directly mapped from the perceptual inputs at each iteration. The problem with the classical AI approach is that the representation is prepared not through actual actions taken by the agent (the robot), but by implementing an e xternally i mposed artificial pur pose. This problem can be attributed to the lack of direct experience , which is related to Husserl’s discussions on phenomenological reduction (see chapter 3). Behavior-based robotics could provide AI researchers and cognitive scientists with a unique means to obtain a view on first- person experience from the viewpoint of a robot by almost literally putting themselves
Dynamical Systems Approach for Embodied Cognition
107
inside its head, thereby affording the opportunity to examine the sensory flow experienced by the robot. Readers should note that the idea of the perception-to-motor cycle with small controllers in behavior-based robots and Braitenberg vehicles is quite analogous to the aforementioned Gibsonian theories emphasizing the role of the environment rather than the internal brain mechanisms (also see Bach, 1987)). Behavior-based approaches that emphasize embodiment currently dominate the field of robotics and AI (Pfeifer & Bongard, 2006). Although this paradigm shift made by the behavior- based robotics researchers is deeply significant, I feel a sense of discomfort that the common use of this approach emphasizes only sensory–motor level interactions. This is because I still believe that we humans have the “cogito” level that can manipulate our thoughts and actions by abstracting our daily experiences from the sensory– motor level. Actually, Brooks and his students examined this view in their experiments applying the behavior-based approach to the robot navigation problem (Matari ć, 1992). The behavior-based robots developed by Brooks’ lab employed the so-called subsumption architecture, which consists of layers of competencies or task-specific behaviors that subsume lower levels. Although in principle each behavior functions independently by accessing sensory inputs and motor outputs, behaviors in the higher layers subsume those in the lower ones by sending suppression and inhibition signals to their sensory inputs and motor outputs, respectively. A subsumption architecture employed for the navigation task is shown in Figure 5.13. The subsumption control of behaviors allocated to different layers includes avoiding obstacles, wandering and exploring the environment, and building map and planning. Of particular interest in this architecture is the top layer module that deals with map building and planning.
Building map & planning Exploring Wandering
Avoiding objects Sensation
Motor
Figure 5.13. The subsumption architecture used for the robot navigation problem in research by Brooks and colleagues.
108
On the Mind
This layer, which corresponds to the cogito level, is supposed to generate abstract models of the environment through behavioral experiences and to use these in goal- directed act ion planning. An import ant remaini ng problem concerns the ways that acquired models or maps of the environment are represented. Daniel Dennett points to this problem when writing “The trouble is that once we try to extend Brooks’ interesting and important message beyond the simplest of critte rs (artificial or biological), we can be quite sure t hat something awfully like representation is going to have to creep in …” (Dennett, 1993, p. 126). The scheme by Matari ć (1992) employed a topological graph representation for the environ ment map consisting of nodes representing landmark types and arrows representing their transit ions in the course of traveling (see Figure 2.2). As long as symbols understood to be arbitrary shapes of tokens (Harnad, 1990) are used in those nodes for representing the world, they can hardly be grounded in the physical world in a metric space common to the physical world, as discus sed earlier. In light of t his, what direc tion of research should behavior- based robotics researchers pursue? Should we give up involving the cogito level or accept the usage of symbols for incorporating cogito level activities, bearing in mind potential inconsistencies? Actually, a clue to resolving this dichotomy can be found in one of Braitenberg's vehicles, Vehicle 12. Although Braitenberg vehicles up to Vehicle 4 have been introduced in numerous robotics and AI textbooks, the thought experiments beyond Vehicle 4, which target higher-order cognitive mechanisms, are equally interest ing. These higher- order cognitive vehicles concern logic, concepts, rules, regularities, and foresights. Among them, Vehicle 12 examines how a train of thought can be generated. Braitenberg implemented a nonlinear dynamical system, a logistic map (see section 5.1), into the vehicle that enables sequences of values or “thoughts” in terms of neuronal activation to be generated in an unpredictable manner but with hidden regularity by means of chaos. Braitenberg argues that this vehicle seems to possess free will to manipulate thoughts, at the least from the perspective of outside observers of the vehicle. We will come back to this consideration in later chapters as the issue of free will constitutes one of the main focuses of this book. So far, we have seen that Gibsonian and Neo-Gibsonian researchers as well as behavior-based robotics researchers who emphasize embodied cognition tend to regard the role of the brain as only that of a minimal
Dynamical Systems Approach for Embodied Cognition
109
controller. This is because even very primitive controllers like the Braitenberg vehicles can generate quite complex behaviors when coupled with environmental stimuli. It is only natural to expect that even higher-order cognition might emerge to some extent if further nonlinearity (like that employed in Vehicle 12) or some adaptability could be added to the controller. Now, we begin to consider minimal forms of an artificial brain, namely neural networ k models that are characterized by their nonlinearity and adaptability, when put into robot heads. Note, however, that these attempts do not accord with our knowledge that the brain is a complex organ, as we have seen in previous chapters. So, let’s contemplate first how this discordance can be resolved.
5.4. Modeling the Brain at Different Levels As for general understanding, neural activity in the brain can be described on the basis of processe s that occur at multiple lev els, st arting from the molecular level (which accounts for processes such as protein synthesis and gate opening in synapses), the neurochemical level (which accounts for signal transmission), the single cell activity level (which accounts for processes such as spiking), and the cell assembly level in local circuits through to the macroscopic regional activation level measurable with technologies such as fMRI or EEG. The ta rget level depends on the phenom enon to be re produced. If we aim to model the firing activity of a single cell, we describe precisely how the membrane poten tial changes as a re sult of ion flow in a single neuron. If we aim to model neuron interconnection phenomena, as observed in t he hippocampus by Ikega ya and colleagues (200 4) using optical recording techniques, the model should focus on how spiking activity can spread across local circuits consisting of thousands of interconnected neurons. On the other hand, if we aim to model neural processing related to the generation of cognitive behavior, it would not be a good idea to model a single spiking neuron. Rather, such modeling would require the reproduction of interactions between multiple brain regions to simulate the activities of tens of billions of spiking neurons, something that is impossible to perform with computer technology currently available to us. Another problem besides computational power is the operation and
110
On the Mind
maintenance of such a tremendously complex simulator as well as techniques for processing the results of simulations. In fact, using supercomputers to reproduce neural circuits in the brain presents some considerable challenges in terms of making the simulation realist ic. At present, we can obtain experi mental data about connectivity between different types of neurons by using techniques such as labeling individual neurons with distinctly colored immunofluorescence markers appearing in specially modified transgenic animals. These labeled neurons can be traced by confocal microscopy for each section of the sampled tissue, and eventually a three - dimensional reproduction of the entire system of interconnected neurons can be prepared by stack ing a number of the images. For example, the Blue Brain project led by Henry Markram (Markram et al., 2015) reconstructed microcircuitory in the somatosensory neocortical of rat, consisting of about 31,000 neurons in a didgital computer model. This simulation coped with neurophysiological details such as reconstruction of firing propertis of 207 morpho- electrical types of neural cells in the circuit. The project is now attempting to reproduce the entire visual cortex, which consists of about a million of columns each of which consists of about 10,00 0 cells. If this is achieved, it may also be possible to create a cellular- level replica of the entire brai n! Of course, such an accomplishment would provide us with vast amounts of scienti fic insight. At the same time, however, I wonder how tractable such a realistic brain simulator would be. I imagine that for a realistic replica of the brain to function properly, it might also require realistic interactions with its environment. Therefore, it should be connected to a physical body of some sort to attain equally realistic sensory– motor interactions with the environment. It may take several years for the functions of a human-level brain replica to develop to a sufficiently high level by being exposed to realistic sensory– motor interactions, as we know that the development of cognitive capabilities in human infants requires a comparably long period of intensive parental care. Also, if a human-level brain replica must be embedded in various social contexts in human society to ensure its proper development, such an experiment may not be feasible for various other reasons, including ethical problems associated with building such creatures. These issues will arise again in the final chapters of this book. If the goal of modeling, though, is to build not a complete replica of the human brain but rather an artifact for synthesis and analysis that
Dynamical Systems Approach for Embodied Cognition
111
can be used to obtain a better understanding of the human mind and cognition in general in terms of its organizational and functional principles, such models must be built with an adequate level of abstraction to facilitate their manipulability. Analogously, Herbert Simon (1981) wrote that we might hope to be able to characterize the main properties of the system and its behavior without elaborating the detail of either the outer or inner environments in modeling human. Let us remember the analytical results obtained by Churchland and colleagues (2010) showing that the principal dimensions of ensembles of neuronal firing can be reduced to a few, as introduced in chapter 4. Then, it might be reasonable to assume that the spiking of some hundreds of neurons can be reproduced by simulating the activities of a few representative neural units modeled as point masses. An interesting observation is that the macroscopic state of collective neural activity changes continuously and rather smoothly in low-dimensional space, even though the activity of each neuron at each moment is discontinuous and noisy in regard to spiking. So, cognition and behavior might just correlate with this macroscopic state, which changes continuously in a space whose dimensionality is several orders lower than the srcinal dimensionality of the space of spiking neurons. Consequently, it might be worthwhile to consider a network model consisting of a set of interacting units in which each unit essentially represents a single dimension of the srcinal collective activity of the spiking neurons. Actually, this type of abstraction has been assumed in the connectionist approach, which is described in detail in the seminal book Parallel and distributed processing: Explorations in the microstructure of cognition , edited by Rumelhart, McClelland, and the PDP Research Group (1986). They showed that simple network models consisting of sets of activation units and connections can model various cognitive processes, including pattern matching, dynamic memory, sequence generation-recognition, and syntax processing in distributed activation patterns of the units. Those cognitive processes are emergent properties of the interactive dynamics within networks, which result from the adjustment of connectivity weights between the different activation units caused by learning. Among the various types of connectionist network models proposed, I find par ticularly interesting a dynamic neural network model called t he recurrent neural network (RN N) (Jordan, 1986; Elman, 1990; Pollack, 1991). It is appealing because it can deal with both spatial and temporal information structures by utilizing its own dynamic properties.
112
On the Mind
However, the most important characteristics of RNNs are their generality. As we proceed, we’ll see that RNNs, even in their minimal form, can exhibit general cognitive functions of learning, recognizing, and generating continuous spatiotemporal patterns that achieve generalization and compositionality while also preserving context sensitivity. These un ique characteristics of R NNs are due to the fact that they are nonlinear dynamical systems with high degrees of adaptability. It is well known that any computational process can be reconstructed by nonlinear dynamical systems as long as t heir parameters are adequately set (Crutchfield & Young, 1989). A study by Hava Siegelmann (1995) has established the possibility that analog computations by RNNs can exhibit an ultimately complex computational capability that is beyond the Turing limit. This can be understood by the fact that a nonlinear dynamical system can exhibit complexity equivalent to an infi nite state machine depending on its parameters, as described in section 5.1. Next, we start to look at a simpler neural network, the feed-forward network that can learn input-output mapping functions for static patterns. Then, we show how this feed-forward network can be extended to RN Ns, which can learn spatio- temporal patterns. At the same time, we examine the basic characteristics of the RNN model from the perspective of nonlinear dynamical systems.
5.5. Neural Network Models This section introduces three types of basic neural network models including the three- layered feed-forward network, the discrete- time RNN, and the continuous- time RNN (CTRNN). All three types have two distinct modes of operation. One is the learning mode for determining a set of optimal connectivity weights from a training dataset, and the other is the test ing mode in which an optimal outpu t pattern is generated from an example test input pattern. 5.5.1 The Feed-Forward Network Model The feed-forward network model is shown in Figure 5.14. It consists of an input unit layer, a hidden unit layer and an output unit layer. Neural activations propagate from the input units to the hidden units and to the output units through the connectivity weights spanning between
Dynamical Systems Approach for Embodied Cognition
113
Oni Oni
δni = – (oni – oni
Output
∆Wij
Wij
(. i . on ( – oni (
i j = – εδn . an
j
Hidden
an
( i δnj = Σi (δn . Wij (. anj . ( – anj
Wjk
∆Wjk =
– εδnj . ank
in nk
Input
Figure 5.14. The feed-forward network model. Feed-forward activation and error back-propagation schemes are illustrated in the model. The right side of the figure shows how the delta error and the updated weights can be calculated through the error back- propagation process from the output layer to the hidden layer.
each layer. The objective of learning is to determine a set of optimal connectivity weights that can reconstruct input- output patterns given in the target training dataset. The learning is conducted by utilizing the error back-propagation scheme that was conceived independently by Shun-Ichi Amari (1967), Paul Werbos (1974), and Rumelhart and colleagues (1986). We assume that the network consists of input units (indexed with k), hidden units (indexed with j, and output units (indexed with i) and is trained to produce input-output mapping for P different patterns. The activations of the units when presented with the nth pattern a re denoted as innk , anj , and oni , respectively, where innk , is given as input. The potentials of the hidden and output units are denoted as unj and uni , respectively, and the training target of the nth pattern is denoted as oni . Thus, the forward activation of an output unit is written as:
uni =
∑wa j
j ij n
+ bni
oni = f (uni )
(Eq. 7a) (Eq. 7b)
Where bni is a bias value for each unit and f is a sigmoid functiion. Similarly for the hidden units as:
unj =
∑ w in k
jk
k n
+ bnj
(Eq. 8a)
114
On the Mind
anj = f (unj )
(Eq. 8b)
Here, the goal of learning is to min imize t he square of the error between the target and the output, as shown in Eq. 9
En =
1 2
∑ (o− ⋅ o− ) (o i
i n
i n
i n
oni )
(Eq. 9)
First, we formulate how to update the connection weights in the output layer, which are denoted as ∆wij . Because the weights should be updated in the di rection of minimizi ng the square of the error, the direction can be obtained by taking the derivative of E n with respect to wij as follows:
∆wij = − ε
∂E n ∂wij
The right side of this equation can be decomposed as:
−ε
∂E n ∂E n ∂ u i =−ε i ⋅ n ∂wij ∂un ∂wij
By applying Eq. 7a to the second derivation on the right side, we obtain
−ε
∂E n ∂E n = − ε i ⋅ anj ∂wij ∂u n
(Eq.10)
∂E n isth e delta error of the ith unit, which is denoteed as δ in . The ∂uni
Here,
delta error represents the contribution of the potential value of the unit to the square error:
δ in =
∂E n i
∂unn ∂ E ∂o i = i ⋅ ni ∂on ∂un By applying Eq. 9 to the first term on the right side and taking the derivative of the sigmoid function with respect to the potential for the second term of the preceding equation, the delta error at the ith unit can be obtained as follows.
Dynamical Systems Approach for Embodied Cognition
δ in = −( oni− oni⋅ )⋅ oni− (1 oni )
115
(Eq. 11)
Furthermore, by utilizing the delta error in Eq. 10, the updated weight can be written as: (Eq. 12)
∆w=ij − εδ⋅ in anj
Next, the as updated of the hidden ∆w jk , connection which we areobtain denoted by taking weights the derivative of E n, layer, with respect to w jk .
∆w jk = − ε =−ε
∂E n ∂w jk ∂E n ∂unj ⋅ ∂unj ∂w jk
∂E n with the delta error at the jth δ nj and folding by By substituting j ∂ u n ∂unj at ank by applying Eq. 8a, the updated weights can be written as: ∂w jk ∆w jk=− εδ ⋅ nj ank
(Eq. 13)
Here, δ nj can be derived from the previously obtained δ in as folllows :
δ nj =
∂E n ∂unj
∂E n ∂u i ∂a j = ∑ i i ⋅ nj ⋅ nj ∂u n ∂a n ∂u n = ∑ i ( δ in ⋅ wij ) ⋅ anj ⋅ (1 − anj )
∑ (δ
(Eq. 14)
⋅ wij ) in the first term on the right side represents the sum of delta errors δ in back-propagated to the jth hidden It should be noted that
i
i n
unit multiplied by each connection weight wij . If there are more layers, the same error back-propagation scheme is repeated, in the course of which (1) the delta error at each unit in the current layer is obtained by back-propagating the error from the previous layer through the connection weights and (2) the incoming connection weights to the units in the
116
On the Mind
current layer are updated by using the obtained delta errors. The actual process of updating the connection weights is implemented through summation of each update for all traini ng patterns as:
w new = w old +
P
∑∆w
(Eq. 15)
n
n
5.5.2 Recurrent Neural Network Models Recurrent neural network models have been used to investigate the human cognitive capability of dealing with temporal processes such as in motor control (Jordan, 1986) and language learning (Elman, 1990). Let’s look at the exact form of the RNN model. Although various types of RNNs have been investigated so far (Jordan, 1986; Doya & Yoshizawa, 1989; Williams & Zipser, 1989; Elman, 1990; Pollack, 1991; Schmidhuber, 1992), it might be helpful to look at the Jordan-type RNN (Jordan, 1986), as it is one of the simplest implementations, illustrated in Figure 5.15. This model has context units in addition to current step inputs int and next step output out t +1. The context unit s represent context or t he internal state i n representing dynamic sequenc e patterns. I n the forward dynamics, the current step context unit activation c t is mapped to its next ste p activation c t +1. Let us consider an example of learning to generate a simple 1-dimensional cyclic sequence pattern of period 3 such as “0 0 1 0 0 (a)
(b)
Out3
error
Out3
Outt+ Contextt+ Outt+
In2
t=2
Out2 error Out2 In
t=
Out error
Int Contextt
Out In0
t=0
Figure 5.15. Jordan-type RNN. (a) Forward activation and (b) the error back-propagation through time scheme in the cascaded R NN.
Dynamical Systems Approach for Embodied Cognition
117
1 … 0 0 1… .” In this example, the input is given as a sequence of “0 0 1 0 0 1 … 0 0 ” and the target output is given as t his sequence shifted forward one step, “0 1 0 0 1 0 … 0 1.” The learning of this type of sequence faces the hidden state problem because the sequences include the same target output value in different orders in the sequence (i.e., two 0s in the first step and the second step in this cyclic sequence pattern.) Although this type of sequence cannot be learned by the feed-forward network by means of simple input-output mapping, the RNN model with the context units can learn it if the context unit activation states can be differentiated from ambiguous outputs, that is, a 1-dimensional context activation sequence is formed such as “0.2 0.4 0.8 0.2 0.4 0.8 … 0.2 0.4 0.8,” which is mapped to the output activation sequence of “0 0 1 0 0 1 … 0 0 1.” It is noted that the Jordan-type RNN operated in discrete time steps can be regarded as a dynamical map as shown in Eq. 2 in section 5.1 by considering that the current state Xt consisting of the current step input and the current context state is mapped to next state Xt +1 consisting of the input and the context at next step. Connectivity weights correspond to the parameter P of the dynamical map. Therefore, the RNN model can acquire desired dynamic st ructures by adequately tuning the connectivity weights to the learnable parameters of the dynamical map. For example, the aforementioned cyclic pattern of repeating “0 0 1” can be learned as a limit cycle attractor of period 3. One of the important characteristics of RN Ns is t hat they can exhibit dynamic activities autonomously without receiving any inputs when operated in closed- loop by feeding the prediction output for next step to the input of the current step. This phenomenon is explained qualitatively by Maturana and Varela (1980), who state that neural circuits are closed circuits without any input or output functions. Closed circuits maintain endogenous dynamics that are st ructurally coupled with sensory inputs and produce motor outputs, wherein sensory inputs are considered to be perturbative inputs to the endogenous dynamics. Although it is tempting to think that motor outputs are generated simply by mapping different sensory states to sensory reflexes in different situations, this process should in fact involve additional steps, including utilization of the autonomous internal dynamics of the closed network. Iterative interactions between interconnected neural units afford the RNN a certain amount of autonomy, which might well constitute the srcin of voluntary or contextual sensitivity. Later sections retu rn to t his point, again, as we focus in on the issue of free will.
118
On the Mind
The RNN employs a learning scheme, back-propagation through time (BPT T) (Rumelhart et al., 1986; Werbos, 1988), which has been developed by extending the conventional error back-propagation scheme in the backward time direction to develop adequate dynamic activation patterns in the context units. In the aforementioned feed- forward network model, the connectivity weights between the output layer and the hidden layer are updated by using the error generated between the target output and the generated output. Then the connectivity weights between the input layer and the hidden layer are updated using the delta error back-propagated from the output units to the hidden units. However, in the case of RNN, there are no error signals for the context output units because there are no target values for them. Therefore there are no means to update the connectivity weights between the context output units and the hidden units. However, in this situation, if the delta error back-propagated from the hidden units to the context input units is copied to the context output units in the previous step, the connectivity weights between the context output unit s and t he hidden units can be updated by utiliz ing this copied information. This scheme of the BPTT can be well understood by supposing that an identical RNN is cascaded in the direction of time to form a deep, feed-forward network as shown in Figure 5.15b. In this cascaded network, the current step activation of the context output units is copied to the context input units in nex t step, which is repeated from the star t step to the end step in the forward computation. On the other hand, in the backward computation for the BPTT, the error generated in the output units at a part icular time step i s propagated through the context input units to the previous step context output units, which is repeated until the delta error signal reaches the context input units in the start step. In the BP TT, the error signals srcinating from t he output units of different time steps are accumulated as the time step folds back and by which all the connectivity weights of t he identical RNN can be updated. The capability of the RNN for self-organizing context-dependent information processing can be understood well by looking at a prominent research outcome presented by Jeffrey Elman (1991) on the topic of language learning utilizing RNN models. He showed that a version of RNN, now called an Elman net (Figure 5.16a) can learn to extract grammatical structures from given exemplar sentences. In his simulation experiment, the example sentences for training the network were generated by using a lexicon of 23 items including 8 nouns, 12 verbs, the relative pronoun
Dynamical Systems Approach for Embodied Cognition (a) Next word prediction output
(b) S NP VP “.” NP PropN | N | N RC VP V (NP) RC who NP VP | who VP N boy | girl | cat | dog | boys | girls | cats | dogs PorpN John | Mary V chase | feed | see | hear | walk | live | chases | feeds | sees | hears | walks | lives
Context loop
Current context Current word input
119
(c)
S NP N RC who NP VP
VP V NP N
N V dog who boys feed sees girl
Figure 5.16. Sentence learning experiments done by Elman. (a) The Elman network, (b) the context free grammar employed, and (c) an example sentence generated from the grammar (Elman, 1991).
“who,” and a period for indicating ends of sentences. The sentence generations followed a context-free grammar that is shown in Figure 5.16b. As described in chapter 2, various sentences can be generated by recursively applying substitution rules st arting from S as the top of the tree representing the sentence structure. Especially, the presence of a relative clause with “who” allows generation of recursively complex sentences such as: “Dog who boys feed sees girl.” (See Figure 5.16 c.) In the experiment, the Elman network was used for the generation of successive predictions of words in sentences based on training of exemplar sentences. More specifically, words were input one-at-a-time at each step, and the network predicted the next word as the output. Af ter the prediction, the correct target output was shown and the resultant prediction error was back-propagated, thereby adapting the connectivity weights. At the end of each sentence, the first word in the next sentence was input. This process was repeated for thousands of the exemplar sentences generated from the aforementioned grammar. It is noted that the Elman network in this experiment employed a local representation in the winner- take-all way using a 31-bit vector for both the input and the output units. A particular word was represented by an activation of a corresponding unit out of 31 units. The input and the output units had the same representation. The analysis of network performance after the training of the target sentences showed various interesting characteristics of the network
120
On the Mind
behaviors. First, look at simple sentence cases. When a singular noun “boy” was input, all three singular verb categories as well as “who” for relative clause were activated as possible predicted next words and all other words were not activated at all. On the other hand, when a plural noun “boys” was input, all plural verbs and “who” were activated. This means that the network seems to capture the singular– plural agreements between subject nouns and verbs. Moreover, actual activation values encoded the prob ability dis tribution of next- coming words, because only “boy” or “ boys” cannot determine t he next words determinis tically. It was also observed that the network captures verb agreement structures as well. For example, after two succeeding words “boy lives” are input, a period is predicted. For the case of “boy sees,” both a period and noun words were activated for the next prediction. Finally, for the case of “boy chases,” only noun words were activated. The network seems to understand that “ live” and “chase” are an intransitive verb and a transitive verb, respectively. It also understands that “see” can be both. Although the presence of a relative clause makes a sentence more complex, singular– plural agreements were preserved. An example exist s in the following paired sentences: 1. boy who boys chase chases boy 2. boys who boys chase chase boy Actually, the network activated the singular verbs after being input “boy who boys chase” and activated the plural ones after being input “boys who boys chase.” To keep singular–plural agreements between subjects and distant verbs, the information of singular or plural of the subjects had to be preserved internally. Elman found that context activation dynamics can be adequately self-organized in the network for this purpose. 5.5.3 Continuous Time Recurrent Neural Network Model Next, we look at an RNN model operated in continuous time, which is known as a continuous-time recurrent neural network (CTRNN) (Doya & Yoshizawa, 1989; Williams & Zipser, 1989). Let us consider a CTRNN model without an explicit layer structure in which each neural unit has synaptic inputs from all other neural units and also from its own feedback (see Figure 5.16a as an example). In this model, the activation dynamics of each neural unit can be described in ter ms of the differential equations shown in Eq. 16, equations in which each neural
Dynamical Systems Approach for Embodied Cognition
121
unit has synaptic inputs from all other neural units as well as from its own feedback. τ
u i =− +ui
∑ +a w j
j i
a i = 1 / (1 + e − u )
ij
Ii
(Eq. 16a) (Eq. 16b)
The left side of Eq. 16a represents the time differential of the potential of the ith unit multiplied by a time constant τ , which is equated with the sum of synaptic inputs subtracted from the first term − u i. This means that positive and negative synaptic inputs increase and decrease the potential of the unit, respectively. If the sum of synaptic inputs is zero, the potential converges toward zero. The time constant τ plays the role of a viscous damper with its positive value. The larger or smaller the time constant τ , the slower or faster the change of the potential ui. You may notice that this equation is analogous to Eq. 3 of representing a general form of the continuous-time dynamical system. Next, let’s examine the dynamics of C TRNNs. Randall Beer (1995a) showed that even a small CT RNN consisting of only three neural units can generate complex dynamical structures depending on its parameters, especially the values of the connection weights. The CTRNN model examined by Beer consists of three neural units as shown in Figure 5.17a. Figure 5.17b–d shows that different attractor configurations can appear depending on the connection weights. An interesting observation is that multiple attractors can be generated simultaneously with a given specific connection weight matrix. The eight stable fixed-point attractors and the two limit-cycle attractors appear with each specific connection weight, asshown in Figure5.17b and c, respectively, and the attractor towards which the state trajectories converge depends on the initial state. In Figure 5.17d, a single chaotic attractor appears with a different connection weight matrix. This type of complexity in attractor configurations might be the result of mutual nonlinear interactions between multiple neural units. In summary then, CTRNNs can autonomously generate various types of dynamic behaviors ranging from simple fixedpoint attractors through limit cycles to complex chaotic attractors, depending on the parameters represented by connection weights (this characteristics is the same also for discrete time RNN [Tani & Fukumura, 1995]). This feature can be used for memorizing multiple temporal patterns of perceptual signals or movement sequences, which will be especially important as we consider MTRNNs later.
On the Mind
122
(a)
w
w3
w33
(b) u
w32
2 w23
(c)
0
2
3
w2
w2
w3
w22
(d)
u
2 4
5
2
u 4
6
6
0 0
u3
2
2
5 0
0 4
5
u
u3
0
0 4
2
0
u2
0
u3
2
0
u2
Figure 5.17. Different attractor configurations appear in the dynamics of a continuous time R NN model consisting of three neura l units receiving synaptic inputs from the other two neural units as well as own recur rent ones. (a) The network architecture, (b) eight stable fixed-point attractors denoted as black points, (c) two limit cycles denoted as line circles with arrows, and (d) chaotic attractors. (b), (c), and (d) are adopted from (Beer, 1995a) with permission.
In the case of a CTRNN characterized by the time constant parameter τ , the BPTT scheme for supervised learning is used with slight modifications to the srcinal form. Figure 5.18 illustrates how the BPTT scheme can be implemented in a CTRNN. First, the forward activation dynamics of the CTRNN for n steps is computed by following Eq. 17 with a given initial neural activation state at each unit. Eq. 17 is obtained by converting Eq. 16 from a differential equation form into a difference equation by using Euler’s method for the purpose of numerical computation. 1 ui+ )
uti
= −(1
ati
= 1 / (1 + e −u )
τ
∑
i t −1
i t
1 aw I(+
τi
j
i t −1
ij
i t −1
)
(Eq. 17a) (Eq. 17b)
What we have here is the leaky-integrator neuron with a decay rate of 1 1 − i . After the forward computation with these leaky integrator neural
τ
Dynamical Systems Approach for Embodied Cognition On
On–
On–2
errorn
errorn–
errorn–2
On
On–
0
On–2
0
0
5
3
5
3
4
2
2
n step
123
4
4
2
n– step
5
2
n–2 step
Figure 5.18. An extension of the error back-propagation scheme to CTR NNs. T he figure shows how the error genera ted at the n th step is propagated back to the (n−2)nd step. A rrows with continuous lines, dotted lines, and chain lines denote the back- propagation error generated at the nth step, the (n−1)st step, and the (n−2)nd step, respect ively. These errors continue to back-propagate along the forward connection over time.
units, the back-propagation computation is initiated by computing the th
error between target andinthe output in the n asstep. δ 0n . The delta errorthe at training the output unit thecurrent n th step is computed Then, this delta error is back-propagated to the 1st , 2nd, 4th, and 5th units, as denoted by continuous arrows, through forward connections. These delta errors propagated to local units are further back-propagated to the 1st, 2nd, 3rd, and 4th units in the (n−1)st step and to the 1st , 2nd, and 4th units in the (n−2)nd step. Additionally, the delta errors generated at the output unit in steps n−1 and n−2 are also back-propagated in the same manner, as denoted by dotted-line and chain-line arrows, respectively. This back-propagation process is recursively repeated until the 1st step of the sequence is reached. One important note here is that the way of computing the delta error in CTRNN is different from the one in the conventional RNN because of the leaky integrator term in the forward activation dynamics defined in Eq. 17a. The delta error at the ith unit ∂E , ∂uti either for an output unit or an internal unit, is recursively calculated from the following formula:
i i − ( ot o− ot ) ⋅ ∂E = ∂uti ∂E ∑ k ∈N ∂utk+1
i t
o⋅ (1 −
i t
1 ∂E
) + 1 − τ ∂u i
i t +1
1 1 δ ik 1 − τ + τ wki a⋅ ti (1a− ti )i i k
i ∈Out
(Eq. 18) Out∉
124
On the Mind
From the right-hand side of Eq. 18 it can be seen that the ith unit in the current step t inherits a large portion 1 − 1 of the delta error ∂E τ ∂uti +1 i from the same unit in the next step t+1 when its time constant τ i is relatively large. It is noted that Eq. 18 turns out to be the conventional, discrete time version of BPTT when τ i is set at 1.0. This means that, in a network with a large time constant, error back-propagates through time with a small decay rate. This enables the learning of long-term correlations latent in target time profiles by filtering out fast changes in the profiles. All delta errors propaga ted from dif ferent units are s ummed at each unit in each step. For example, at the 1st unit in the ( n −1)st step, the delta errors propagated from the 0 th, 2 nd, and 1st units are summed to obtain the error for the (n−1)st step. By utilizi ng the delta errors computed for local units at each step, the updated weights for the input connections to those units in step n−1 are obtained by following Eq. 13. Although the aforementioned models of feed-forward networks, RNN and CTRNN employ the error back- propagation scheme as the central mechanism for learning, their biological plausibility in neuronal circuits has been questioned. However, some supportive evidence has been provided by Mu-ming Poo and colleagues (Fitzsimonds et al., 1997; Du & Poo, 2004), as well as by Harris (2008) in related discussions. It has been observed that the action potential back-propagates through dendrites when postsynaptic neurons in the downstream side fire upon receiving synaptic inputs above a threshold from the presynaptic neurons in the upstream side. What Poo has fur ther suggested is t hat such synaptic inhibition or potentiation depending on information activity can propagate backward across not just one but some successive synaptic connections. We can, therefore, speculate that the retrograde axonal signal ( Harris, 20 08) conveying error information might prop agate from the peripheral area of sensory– motor input-output to the higher-order cortical area, modulating its contextual memory structures by passing through multiple layers of synapses and neurons in the real brains like the delta error signal back-propagates from the output units to the internal units in t he CTR NN model. In light of this evidence, then , the biological plausibility of this approach appears promising. It should also be noted, however, that counterintuitive results have been obtained by other researchers. For example, using the “echo-state network” (Jaeger & Haas, 2004), a version of RNN in which internal units are connected with randomly pred etermined constant weights and
Dynamical Systems Approach for Embodied Cognition
125
only the output connection weights from the internal units are modulated without using error back-propagation, Jaeger and Haas showed that quite complex sequences can be learned with this scheme. My question here would be what sorts of internal structures can be generated without the influence of error-related training signals. The next section introduces neurorobotics studies that use some of the neural network models, including the feed-forward network model and the RNN model.
5.6. Neurorobotics from the Dynamical Systems Perspective Although Rodney Brooks did not delve deeply into research on adaptive or learnable robots, other researchers have explored such topics while seriously considering the issues of embodiment emphasized on the behavior-based approach. A representative researcher in this field, Randall Beer (2000), proposed the idea of considering the structural coupling between the neural system, the body, and the environment, as illustrated in Figure 5.19. The internal neural system interacts with its body and the body interacts with it s surrounding environment, so the three can be viewed as a coupled dynamical system. I n this set ting, it is a rgued that the objectiv e of neural adaptation is to keep the behavior of the whole system within a viable zone. Obviously, this thought is quite analogous to the Gibsonian and Neo-Gibsonian approaches as described in section 5.2. In the 1990s, various experiments were conducted in which different neural adaptation schemes were applied in the development of sensory– motor coordination skills in robots. These schemes included: evolutional learning (Koza, 1992; Cliff et al., 1993; Beer 1995; Nolfi & Floreano, 2000; Di Paolo, 2000; Ijspeert, 2001; Ziemke & Thieme, 2002; Ikegami & Iizuka, 2007), which uses artificial evolution of genomes encoding connection weights for neural networks based on principles such as survival of the fittest; value- based reinforcement learning (Edelman, 1987; Meeden, 1996; Shibata & Okabe, 1997; Morimoto & Doya, 2001; Krichmar & Edelman, 2002; Doya & Uchibe, 2005; Endo et al., 2008), wherein the connection w eights are modified in t he direction of reward maximization; and supervised and imitation learning (Tani & Fukumura, 1997; Gaussier et al., 1998; Schaal, 1999; Billard, 200 0; Demiris & Hayes, 2002; Steil et al., 20 04), wherein a teacher or
126
On the Mind Environment
Body
Neural System
Figure 5.19. The neural system, the body, and the environment are considered as a coupled dynamical system by Randall Beer (2000).
imitation targets exist. Most of these experiments were conducted in minimal settings with rather simple robots (mobile robots with range sensors in many cases) with small- scale neural controllers (influenced by Gibsonian and behavior-based philosophy). Although the experiments might have lacked scalability both with respect to engineering applications and accounting for human cognitive competence, they do demonstrate that nontrivial structures in terms of “minimal cognition” can emerge in the structural coupling between simple neural network models and the environment. Let’s look now at a few examples of these studies from among the many that have been conducted. the and followingremarkable emphasize studies the dynamical systems perspectivesEspecially, in developing generating the minimal cognitive behaviors by neurorobots. 5.6.1 Evolution of Locomotion with Limit Cycle Attractors It is widely held that rhythmical movements in animals, such as locomotion, are generated by neural circuits called central pattern generators (CPGs), which generate oscillatory signals by means of limit-cycle dynamics in neural ci rcuits ( Delcomyn, 1980). By constructi ng synthetic
Dynamical Systems Approach for Embodied Cognition
127
simulation models and conducting robotics studies based on the concept of CPGs, a number of researchers have investigated the adaptation mechanisms of walking locomotion in a number of animals: six-legged insects (Beer, 1995b), four-legged dogs (Kimura et al., 1999), and twolegged humans (Taga et al., 1991; Endo et al., 2008), as well as in walking and swimming via the spinal oscillation of four- legged salamanders (Ijspeert, 2001). Especially, Beer (1995b) investigated how stable walking can be achieved for six-legged insect- like robots under different conditions in the interaction between internal neural systems and the environment by utilizing arti ficial evolution within CT RNN models. In this ar tificial evolution scheme, the connectivity weights in CTRNN are randomly modulated in terms of “mutation.” If some robots exhibit better performance in terms of the predefined fitness functions with the modulated weights in the networks as compared with others, these robots are allowed to “reproduce,” with their “offspring” inheriting the same connectivity weigh ts of t he networks. Ot herwise, characterist ic connectivity weights within networks are not “reproduced.” Thus, connectivity weights are adapted in the direct ion of maximizing fit ness over generations of population dynamics. In Beer’s model, each leg is controlled by a local CTRNN consisting of a small number of neural units. Gait and motor outputs serve as sensory inputs for the network in terms of the torques generated when the legs move forward and backward. The six local CTRNNs are sparsely connected to generate overall body movement. During the evolutionary learning stage, the connectivity weights within the local CTRNN as well as the interconnections between the six local CTRNNs are mutated, and the fitness of the individual is evaluated by measuring the maximum forward walking distance within a specific time period. An interesting finding from Beer’s simulation experiments on artificial evolution is that evolved locomotion mechanisms were qualitatively different under different evolutionary conditions. First, if the sensory inputs were constantly enabled during evolution, a “reflective pattern generator” evolved. Because the leg movements were generated by means of reflections of sensory inputs, the locomotive motor pattern was easily distorted when the sensory inputs were disrupted. Second, if the sensory inputs were made inaccessible completely from the network during evolution, a CPG-type locomotive controller evolved. This evolved controller could generate autonomous rhythmic oscillation without having any
128
On the Mind (a)
R3 R2 R L3 L2 L
Velocity
(b)
R3 R2 R L3 L2 L
Velocity
Figure 5.20. Six-legged locomotion patterns generated by the evolved mixed pattern generator (a) shows a gait pattern with sensory feedback, and (b) shows one without sensory feedback. The case with the sensory feedback shows more stable oscillation with tight coordination among different legs. Adopted from (Beer, 1995b) with permission.
external drives, by means of the self-organizing limit cycle attractor in the CTRNN. if thea presence of the generator” sensory inputs was Although made unreliable duringThird, evolution, “mixed pattern evolved. this controller could generate robust basic locomotion patterns even when the sensory inputs were disrupted, it demonstrated better locomotion performance when the sensory feedbacks were available (Figure 5.20). In summary, these experiments showed that limit cycle attractors can emerge in the course of evolving the CTRNN controller for generating locomotion in different ways, depending on the parameters set for the evolution process. When the sensory feedback is available, a limit cycle is organized in the coupling between the internal dynamics of the CTRNN and the environment dynamics. Otherwise, the limit cycle attractor appears in the form of an autonomous dynamic in the CTRNN alone. Beer speculated that the mixed strategy that emerges under the condition of unreliable sensory feedback is the most typical among biological pattern generators.
5.6.2 Developing Sensory– Motor Coordination Schemes of evolutionary learning have been applied in robots for various goal-directed tasks beyond locomotion by developing sensory–motor coordination adequate for such tasks. Scheier, Pfeifer, and Kuniyoshi
Dynamical Systems Approach for Embodied Cognition
129
Figure 5.21. The Khepera robot, which features two wheel motors and eight infrared proximity sensors mounted in the periphery of the body. Source: Wikipedia.
(1998) showed that nontrivial perceptual categorization capabilities can be acquired by inducing interactions between robots and their environments. They prepared a workspace for a miniature mobile robot (55 mm in diameter), called Khepera (Figure 5.21), where large and small cylindrical objects were placed at random. The behavioral task for the robot was to approach large cylindrical objects and to avoid small ones. This task is far from trivial because the sensing capabilities of the Khepera robot are quite limited, consisting of just eight infrared proximity sensors attached to the periphery of the body. Therefore, the robot can acquire eight directional range images representing distances to obstacles, but detection occurs only when an obstacle is within 3 cm, and the images are low resolution. Scheier and colleagues implemented a feed-forward neural network model that receives six directional range images from sensors at the front and controls the speeds of the left and right motors. The synaptic weights necessary for determining the characteristics of mapping sensor inputs to motor outputs were obtained in an evolutionary way. The fitness value for evolutionary selection increased when the robot stayed closer to large cylindrical objects and decreased when the robot stayed closer to small ones. It was reported that when the robot evolved a successful network to accomplish the task, it would wander around the environment until it found an object and then would start circling it (Figure 5.22). The robot
130
On the Mind
Figure 5.22. An illustration of the behavior trajectory generated by a successfully evolved Khepera robot. It would wander around the environment until it found a cylinder of large size and then would start circling it.
would eventually leave its trajectory if the object was a small cylindrical one, otherwise it would keep circling if the object was large. Because it was difficult to distinguish between large and small cylindrical objects by means of passive perception using the installed low-resolution proximity sensors, the evolutionary processes found an effective scheme based on active perception. In this scheme, the successfully evolved robot circled around a cylindrical object, whether small or large, simply by following the curvature of its surface, utilizing information from proximity sensors on one side of its body. A significant difference was found between large and small objects in terms of the way that the robot circled the object by generating different profiles of the motor output patterns which enabled different object types to be identified. This example clearly shows that this type of active perception essential the formation of the robot’s, behavior, whereby perception is and action for become inseparable. Eventually sensory–motor coordination was naturally selected for active perception in their experiment. Nolfi and Floreano (2002) showed another good example of evolution based on active perception, but in this case there is the added element of self-organization, the so- called behavior attractor . They showed that the Khepera robot equipped with a simple perceptron-type neural network model can evolve to disti nguish between walls and cylindrical objects, avoiding walls while staying close to cylindrical objects. After
Dynamical Systems Approach for Embodied Cognition
131
the process of evolution, the robot moves around by avoiding walls and staying close to cylindrical objects whenever encountering them. Here, staying close to cylindrical objects does not mean stopping. Rather, the robot continues to move back and forth and/or left and right while maintaining its relative angular position to the object almost constant. A steady oscillation of sensory– motor patterns with small amplitude was observed while the robot stayed close to the object. Nolfi and Floreano inferred that the robot could keep its relative position by means of active perception that was mechanized by a limit cycle attractor developed in the sensory– motor coupling with the object. These two experimental studies with the Khepera robot show that some nontrivial schemes for sensory– motor coordination can emerge via network adaptation through evolution even when the network structure is relatively simple. Before closing this subsection, I would like to introduce an intriguing scheme proposed by Gaussier and colleagues (1998) for generating immediate imitation behaviors of robots. The scheme is based on the aforementioned thoughts by Nadel (see section 5.2) that immediate imitation as a means for communication can be generated by synchronization achieved by a simple sensory–motor mapping organized under the principle of homeostasis. Gaussier and colleagues built an arm robot with a vision camera that learned a mapping between the arm's position as perceived in the visual frame and the proprioception (joint angles) of its own arm by using a simple perceptron-type neural network model. After the learning, another robot of a similar configuration was placed in front of the robot and the other robot moved its arm (Figure 5.23). Vision camera
Visual percept Controller Proprioception Self-robot
t n e m e v o m m r A
t n e m e v o m m r A
Other robot “like me”
Figure 5.23. A robot generates immediate imitation ofanother robot’s movement by using acquired visuo-proprioceptive mapping (Gaussier et al., 1998).
132
On the Mind
When the self-robot perceived the arm of the other robot as its own, its own arm was moved and synchronized with the one of the other for the sake of minimizi ng the difference between the current proprioception state and its estimation obtained from the output of the visuoproprioceptive map under the homeostasis principle. This study nicely illustrates that immediate imitation can be generated as synchronicity by using a simple sensory– motor mapping that also supports the hypothesis of the “l ike me” mechanism also described in sect ion 5.2. Next, we look at a robo tics experiment that uses sensory– motor mapping but in a context-dependent manner. 5.6.3 Self-Organization of Internal Contextual Dynamic Structures in Navigation We should pause here to remind ourselves that the role of neuronal systems should not be regarded as a simple mapping from sensory inputs to motor outputs. Recalling Maturana and Varela (1980), neural circuits are considered to exhibit endogenous dynamics, wherein sensory inputs and motor outputs are regarded as perturbations of and readouts from the dynamical system, respectively. This should also be true if we assume dynamic neural network models wit h recurrent connections, such as RN Ns or CT RNNs. The following study shows such an examp le from my own investigations on learning goal-directed navigation, which was done in collaboration with Naohiro Fukumura (Tani & Fukumura, 1993, 1997). The experiment was conducted with a real mobile robot named Yamabico (Figure 5.24a). The task was designed in such a way that a mobile robot with limited sensory capabilities learns to navigate given paths in an obstacle environment through teacher supervision. It should be noted that the robot cannot access any global information, such as its position in the X-Y coordinate system in the workspace. Instead, the robot has to navigate the environment depending solely on its own ambiguous sensory inputs in the form of range images representing the distance to surrounding obstacles. First, let me explain a scheme called “branching” that is implemented in low-level robot control. The robot is preprogrammed with a collision avoidance maneuvering scheme that determines its reflex behavior by using inputs from the range sensors. The range sensors perceive range images from 24 angular directions covering the front of the robot.
Dynamical Systems Approach for Embodied Cognition
133
(a)
CCD cameras
Laser projector
(b)
(c)
(d) Branching
e im T
4
4
Action (branching)
3 start
3 2
2
L
Range Sensor
context units recurrent loop
R
Figure 5.24. Yamabico robot and its control architecture. (a) The mobile robot Yamabico employed in this experiment. (b) An example of a collisionfree movement trajectory that contains four branching points labeled 1 to 4. (c) The corresponding flow of range sensor inputs, where brighter (closer) and darker (farther) parts indicate their ranges. The exact range profile at each branching point is shown on the r ight. Arrows i ndicate the branching decision to “advance” to a new branch or to “stay” at the current one. (d) The employed RNN model that receives inputs from range sensors and outputs the branching decision at each branching point.
The robot essentially moves toward the largest open space in a forward direction while maintaining equal distance to obstacles on its left and right sides. Then, a branching decision is required when a new open space appears. Figure 5.24b,c illustrates how branching takes place in this workspace.
134
On the Mind
Once this branching scheme is implemented in the robot, the essence of learning how to navigate the environment is reduced to the task of learning the correct branching sequences associated with the sensory inputs at each branching point. Here, the RNN model is used for learning the branching sequences. Figure 5.24 d shows how the Jordan-type RNN (Jordan, 1986) explained previously was used in the current navigation task. In this architecture, the srcinal 24- dimensional range images are reduced to a three- dimensional vector by using a preprocessing scheme. This reduced sensory vector is provided as input to the RNN at each branching step, and the RNN outputs the corresponding branching decision along with the context outputs. Learning proceeds under supervision, wherein the experimenter trains the robot to generate correct branching on specified target routes. The target route in this experiment is designed such that cyclic trajectories emerge in the form of a figure-8 and a circular trajectory at the end, alternating them, as shown in Figure 5.25a. In the actual training of the robot, the robot is guided repeatedly to enter this target cyclic route by starting from various locations outside the cyclic route (see Figure 5.25b for the traces of the training trajectories). Then, a set of sequential data consisting of the sensory inputs and branching decisions along with the branching sequences are acquired. This sequential data is used to train the RNN so it can generate correct branching decisions upon recei ving sensory inputs in the respect ive sequences. Note that this is not just simple learning of input-output mapping, because the sensory inputs cannot necessarily determine the branching outputs uniquely. For example, the decision whether to move left and down or straight and down at the switching position denoted as A in Figure 5.25 a should depend on the current context (i.e., whether the last travel was a figure- 8 or a circular trajectory) instead of on solely sensory inputs, because the latter are the same in both cases. This is called the sensory aliasi ng problem. It is expected that such differentiation of context unit activation can be achieved through adaptation of the connection weights. After the training stage, the experimenter examines how the robot can accomplish the learned task by placing the robot in arbitrary initial positions. Figure 5.25 c shows two examples of evaluation trials, in which it can be seen that the robot always converges toward the desired loop regardless of its starting position. The ti me required for achieving convergence is different in each case, and even if the robot leaves the
Dynamical Systems Approach for Embodied Cognition (a)
135
(b)
A
(c)
Figure 5.25. Training and evaluation trajectories. (a) The target trajectory, which the robot loops around, forming a sequence of figure-8 and circular trajectories, with A as the switching point between two sequences, (b) the traces of the training trajectories, and (c) the traces of evaluation trajectories star ting from arbitrary i nitial positions. Adapted from (T ani & Fuku mura, 1997) with permission.
loop after convergence under the influence of noise, it always returns to the loop after a time. These observations indicate that the robot has learned the objective of the navigation task as embedded in the attractor dynamics of limit cycles, which are structural ly stable. It is interesting to examine how the task is encoded in the internal dynamics of the RNN. By investigating the activation patterns of the RNN after its convergence toward the loop, it is found that the robot is exposed to a lot of noise during navigation. It is found as well that the sensing input vector becomes unstable at particular locations and that
136
On the Mind
the number of branches in one cycle is not constant, even though the robot seems to follow the same cyclic trajectory. At the switching point A for either route, the sensory input receives noisy jitter in different patterns independent of the route. The context units, on the other hand, are completely identifiable between two decisions, which suggests that the task sequence between two routes is hardwired into the internal contextual dynamics of the RN N, even in a noisy environmen t. To sum up, the robot accomplished the navigation task in terms of the convergence of attractor dynamics that emerge in the coupling of internal and environmental dynamics. Furt hermore, situations in which sensory aliasing and perturbations arise can be disambiguated in navigating repeated experienced trajecto ries by self- organizing the autonomous internal dynamics o f the RN N.
5.7. Summary The current chapter introduced the dynamical systems approach for modeling embodied cognition. The chapter started with an introduction of nonlinear dynamics covering characteristics of different classes of attractor dynamics. Then, it described Gibsonian and Neo- Gibsonian ideas in psychology and developmental psychology, ideas central to the contemporary philosophy of embodied minds (Varela et al., 1991). These ideas fit quite well with the dynamical systems approach, and this chapter looked at how they have influenced behavior-based robotics and neurorobotics researchers who attempt to understand the essence of cognition in terms of the dynamic coupling between internal neural systems, bodies, a nd environments. This chapter also provided brief tutorials on connectionist neural network models with special focus on dynamic neural network models including RNN and CTRNN. The chapter concluded by introducing some studies on neurorobotics that aim to capture minimum cognitive behaviors based on the ideas of nonlinear dynamical systems and by utilizing the schemes of dynamic neural network models. Although the dynamical systems views introduced in this chapter in terms of Gibsonian psychology, connectionist level modeling, and neurorobotics may provide plausible accounts for some aspects of the embodied cognition, some readers might feel that these do not solve all
Dynamical Systems Approach for Embodied Cognition
137
of the essential problems outstanding in the study of cognitive minds. They may ask how the dynamical systems approach described so far can handle difficult problems including those of compositionality in cognition, of free will, and of consciousness. On the other hand, some others such as Takashi Ikegami have argued that simple dynamic neural network models are suf ficient to exh ibit a variety of higher- order cognitive behaviors such as turn ta king ( Ikegami & Iizuka, 20 07) or free decision (Ogai & Ikegami, 2008), provided that the dynamics of the coupling of bodies and environments are developed as specific classes of complex dynamics. The next chapter introduces my own thoughts on the issue, and I put more emphasis on subjectivity than on the objective world as we try to articulate a general thought of embodied cognition through the study of neurodynamical robot models.
Part II Emergent Minds: Findings from Robotics Experiments
6 New Proposals
The examples of “learnable neurorobots” described in chapter 5 illustrate how various goal-directed tasks can be achieved through self-organizing adequate sensory–motor coupling between the internal neuronal dynamics and the body–environment dynamics. Although the adaptive behaviors presented so far seem to capture at least some of the essence of embodied cognition, I feel that something important is still missing. That something is the subjectivity or intentionality of the system.
6.1. Robots with Subjective Views Phenomenologists might argue that subjectivity cannot be detected explicitly because the goal of embodied cognition is to combine the subjective mind and the objective world into a single inseparable entity through interactions with the environment. However, I argue that such a line of robotics research focuses only on “reactive behavior” based on the perception-to-motor cycle, and, therefore, might never be able to access the core problem of the dichotomy between the subjective mind and the objective world. All these robots do is to generate adequate motor commands reactively to current sensory inputs or to current internal states summed with past sequences of sensory i nputs.
141
142
Emergent Minds: Findings from Robotics Exper iments
When my group conducted the robot navigation experiments aimed at learning cyclic trajectories mentioned in section 5.6, in the beginning I was interested in observing the emergent behaviors of the robot in terms of generating diverse trajectories in its transient states before converging to a limit cycle. However, after a while, I began to feel that robots with such reactive behaviors are simply like the steel balls in pinball machines, repea tedly bouncing against the pins u ntil they fi nally disappear down the holes. Although we might see some complexity on the surface level of these behaviors, they are fundamentally different from those generally expected from humans in the contexts of both phenomenology and neuroscience. The behaviors of these robots seem too automatic and not requiring any effort, as happens in machines, which show no traits of subjectivity. The behaviors of these robots might be analogous to patients with alien hand syndrome who show behaviors generated automatically, as afforded by related perception without subjective or intentional control (see section 4.2). Going back to Husserl (2002), he considered that the world consists of objects that the subject can consciously meditate on or describe. However, the bottom line is that direct experiences for humans srcinate not in such consciously representable objects but in the continuity of direct experiences in time. As described in chapter 3, in considering this problem, Husserl assumed a three-level structure in phenomenological time that consists of the absolute flow at the deepest level, the preempirical time level of retention and protention, and the objective time at the surface level. He also considered that the continuous flow of experiences becomes articulated into consciously accessible events or objects as a result of its development though these phenomenological levels. According to Husserl, this development is achieved through a process of interweaving double intentionality, namely transversal (retention and protention) and longitudinal (immanence of levels) intentionality, into the unitary flow of consciousness. Certainly, robots characterized with reactive behavior have nothing to do with such intentionality for consolidating asyetunknown everyday experiences into describable or narrative objects. This is both good and bad. Although such robots might be able to mimic smart insects, such as tumblebugs that skillfully roll balls of dung down pathways, at this level of sophistication they are not yet capable of authentic and of inauthentic being as characterized by Heidegger (see section 3.4). This is to say that current robo ts cannot, like human beings, construct their own subjective views of the worl d by str ucturing and objectif ying
New Proposals
143
experiences accumulated through interactions with the world, and especially with other beings more or less like themselves within it. Constructed by each individual when constantly facing various problems unique to that individual’s place within the objective world, such characteristic viewpoints and t he experiences that underlie them represent the subjectivity of the individual within t he greater social system. What I would like to build are robots that realize what Heidegger considered authentic being, a character that presumably emerges in dynamic interplay between looking ahead toward possible futures and reflecting on one’s own unique past in order to recruit the resources necessary to enact and realize the most possible future shared with others (see section 3.4). How can the subjective views be constructed? Clearly, we are at the formative stages of this work. However, some clue as to how to begin— and make no mistake, this is the very beginning— appeared first in section 4.2, which explained the possible role of the predictive model for action generation and recognition in the brains. As Gibson and Pick conjectured (2000), a set of perceptual structures obtained when an active learner engages in perceptual interaction with the environment and extracts information from it can be regarded as a subjective view belonging to that individual. Such an agent can have a proactive expectation of what the world should look like as it performs its intended actions. The developmental psychologist Claes von Hofsten has demonstrated that even 4– month-old infants exhibit such anticipatory behaviors. They track moving objects even when temporarily hidden from view by making a saccade to the reappearance point before the object reappears there (Rosander & von Hofsten, 2004). When they plan to reach for an object, their hands start to close before the object is encountered as they take into account the direction of and distance to the object (von Hofsten & Rönnqvist, 1988). These infants have prospects for their actions. These are the formative stages in the development of a potentially authentic being.
6.2. Engineering Subjective Views into Neurodynamic Models So, as a first step in understanding how an artificial agent such as those under consideration in this book may be engineered with the capacity to act and eventually to be responsible for its actions, and moreover for how
144
Emergent Minds: Findings from Robotics Exper iments
the world turns out because of them, we now need to consider a theoretical conversion from the reactive-type behavior generated by means of perception-to-action mapping to the proactive behavior generated by means of intention-to-perception mapping. Here, perception is active, and should be considered as a subject acting on objects of perception, as MerleauPonty (1968) explained in terms of visual palpation (see section 3.5). In terms of the neurodynamic models from which our robots are constructed, the perceptual structure for a particular intended action can be viewed as vector flows in the perceptual space as mapped from this intention. The vector flows constitute a structurally stable attractor. Let me explain this idea by considering some familiar examples. Suppose the intended action is your right hand reaching to a bottle from an arbitrary posture. If we consider a perceptual space consisting of the end-point position of the hand that is visually perceived and proprioception of the hand posture at each time step, the perceptual trajectories for reaching the bottle from arbitrary positions in this visuo-proprioceptive space can be illustrated with reduced dimensionality as shown in Figure 6.1a as a flow toward and a convergence of vectors around an attractor that stands as the goal of the action. These trajectories, and the actions that arise from them, can be generated by fixed point attractor dynamics (see section 5.1). In this case, the position of the fixed point varies depending on the position of the object in question, but all actions of a similar form can be generated by this t ype of attractor. Another example is that of shaking a bottle of juice rhythmically. In this case, we can imagine the vector flow in the perceptual space as illustrated in Figure 6.1b, which corresponds to limit cycle attractor dynamics. The essence here is that subjective views or images about the intended actions can be developed as perceptual structures represented by the corresponding attractor embedded in the neural network dynamics, as we have seen with CTRNN models that can develop various types of attractors (sect ion 5.5). By switching from one intention to another, the corresponding subjective view in terms of perceptual trajectories is generated in a top-down manner. These perceptual structures might be stored in the parietal cortex associated with intentions received from the prefrontal cortex, as discussed in section 4.2. This idea is analogous to the NeoGibsonian theory (Kelso, 1995) in which movement patterns can be shifted by phase transitions due to changes in the system parameters (see section 5.2). The top-down projection of the subjective view should (only implicitly) have several levels in general, wherein the views at higher levels
New Proposals (a)
145
(b)
n
n
it o p e c o i r p o r P
io t p e c o i r p o r P
Vision
Vision
Figure 6.1. The perceptual trajectories for different intended actions in visuo-proprioceptive space, for (a) approaching an object and (b) shaking it.
might be more abstract and those at lower levels might be more concrete and detailed. Also, top- down views of the world should be “compositional” enough so that proactive views for various ways of intentionally interacting with the world can be represented by systematically recombining parts of images extracted from accumulated experiences. For example, to recall once again the very familiar image of everyday routine action with which this text began, when we intend to drink a cup of coffee, the higher level may combine a set of subintentions for primitive actions such as reaching-to-cup, grasping-cup, and bringing-cup-tomouth in sequences that may be projected downward to a lower level where detailed proactive images of corresponding perceptual trajectories can be generated. Ultimately, perceptual experiences, which are associated with various intentional interactions with the world, are: semantically combinatorial l anguage of thought (Fodor and Pylyshyn, 1988). One essential question is how the higher level can manipulate or combine action primitives or words systematically. Do we need a framework of symbol representation and manipulation, especially in the higher cognitive level, for this purpose? If I said yes to this, I would be criticized just like Dreyfus criticized Husserl or like Brooks criticized conventional AI and cognitive science research. What I propose is this: We need a neurodynamic system, wellformed through adaptation, that can afford compositionality as well as
146
Emergent Minds: Findings from Robotics Ex periments
systematicity and gives an impression that as if discrete symbols existed within the system as well as that as if these symbols were manipulated by that system. The model of ‘compositional’ or ‘symbolic’ mind to which I am now pointing is not impossible to achieve in a neurodynamic system, if we remember that the sensitivity of chaos toward initial conditions exhibits a sort of combinatory mechanics by folding and stretching in phase space. This chaotic dynamic can produce combinatory sequences of symbols in terms of symbolic dynamics via partitioning processes of the continuous state space with a finite number of labels, as described in section 5.1. In simpler terms, the continuous space of action can be cut up into chunks, and these chunks can be referenced as things in themselves, represented, symbolized. Dale and Spivey (2005) have provided a sympathetic argument, proposing that the promise of symbolic dynamics lies in articulating the transition from dynamical, continuous descriptions of perception into the theoretical language of discrete, algorithmic processes for high-level cognition. What I am saying here is that the segmentation of “thinking” into discrete “thoughts,” which are represented in terms of logical operators, as propositions, as combinations of symbols, can be performed by dynamic models of mind that do not employ discrete symbolic computation in their internal operations (Tani & Fukumura, 1995.) What about creative composition of primitives into novel sequences of action? Neurodynamic models account for this capacity, as well. Nonlinear dynamics can exhibit structural changes of varying discreteness, as can be seen in bifurcations from one attractor structure to another or in phase transitions by means of controlling relatively low-dimensional external parameters. So, we may suppose that the higher level sending sequences of parameter values to the lower level in the network results in sequential switching of primitive actions by means of the parameter bifurcation in this lower neurodynamic system. And if the neurodynamics in the higher level for generating these parameter sequences is driven by its intrinsic chaos, various combinatory sequences of the primitive actions could be generated. Figure 6.2 illustrates the idea. Although an agent driven by top-down intentions for action has proactive subjective views on events experienced during its interaction with the objective environment, its cognitive mind should also reflect on unexpected outcomes through the bottom-up process to modify the current intention. This modification of the intentions for action in the bottomup process can be achieved by utilizing information about the prediction error, the possibility of which having been briefly mentioned in the previous section. Figure 6.2 illustrates the process whereby the state values in
Higher level Set intention in terms of initial state.
ify Mod
Sequences of state values sampled at Poincaré section. (0.91, 0.24), (0.37, 0.91), (0.65, 0.55) ......
Bottom-up error related signal
Lower level
Sequences of parameter bifurcation Predicting sequences of action primitives parameter (0.37, 0.91)
parameter (0.91, 0.24)
n o i t p
n o i t p
c e o i r p o r P
c e o i r p o r P
Vision
Vision
Predicting visuo-proprioceptive state Error Actual
Figure 6.2. An illustration showing how chaos can generate diverse sequential combinations of action primitives. In the higher level, the state trajectory is generated by a chaotic dynamic system with a given initial state and the state values are sampled at each time they cross a Poincaré section. These state values are input to a parameterized dynamic system in the lower level as its parameters successively (along the solid arrow) cause sequential bifurcation in the parameterized dynamic system and associated action primitives. The lower level predicts the coming visuo-proprioceptive state and its prediction error is monitored. The state in the higher level is modified in the direction of minimizing this error (along the dashed arrow.)
148
Emergent Minds: Findings from Robotics Exper iments
the higher level are modified to minimize the prediction error in the lower level. This error signal might convey the experience of consciousness in terms of the first-person awareness of one’s own subjectivity because the subjective intention is directly differentiated from the objective reality and the subject feels, as it were, “out of place” and thus at a difference from its own self-projection. My tempting speculation is that the authentic being could be seen in a certain imminent situation caused by such error or conflict between the two. In summar y, what I am suggesting is t hat nonlinear neurodynamics can support discrete computational mechanics for compositionality while preserving the metric space of real- number systems in which physical properties such as position, speed, weight, and color can be represented. In t his way, neurodynamic sys tems are able to host both semantically combinatorial thoughts at higher levels and the corresponding details of their direct perception at lower levels. Because both of these share t he same phase space in a coup led dynamical sys tem, they can interact seamlessly and thus densely, not like symbols and patterns that interact somewhat awkwardly in more common , socalled hybrid architectures. Meanwhile, the significance of symbolic expression is not only ret ained on t he neurodynamic account but it is clarified , and with this newfound clar ity we may an ticipate man y historical problems regarding the nature of representation in cogn ition in philosoph y of mind to finally d issolve.
6.3. The Subjective Mind and the Objective World as an Inseparable Entity Next, let’s extend such thinking further and examine how the subjective mind and how the objective world might be related. Figureand 6.3bottom-up illustrates conceptually the interactions between top-down processes take place in the course of executing intended actions. It is thought that the intention of the subjective mind (top-down) as well as the perception of the objective world (bottom-up) proceeds as shown in Figure 6.3 (left panel). These two processes interact, resulting in the “recognition” of the perceptual reality in the subjective mind and the “generation” of action in the objective world (middle panel). This “recognition” results in the modification of the subjective mind―and
New Proposals subjective mind
subjective mind
intention
predict
perceive
intention recognize
perceive
predict
act
149
subjective mind modify
recognize
perceive
predict
act
physical interaction objective world
objective world
objective world
Figure 6.3. The subjective mind and the objective world become an inseparable entity through interactions between the top-down and bottom-up pathways. Redrawn from Tani (1998).
potential consciousness―whereas the “generation” of action modifies the objective world, and the interactions continue with the modified states of the mind and the world (right panel). In this process, we see the circular causality between action and recognition. This circular causality results in inseparable flows between the subjective mind and the objective world as they reciprocally intertwine with each other via actionperception cycles, as Merleau-Ponty proposed (Merleau-Ponty, 1968). If we were able to achieve this scenario in a robot, the robot would be free from Descar tes’s Carte sian dualism, as its subjective mind and the objective world could finally become inseparable. I want to conclude this chapter by pointing out what I consider to be essential for constructing models of t he cognitive mind: 1. The cognitive mind is best represented by nonlinear dynamical systems defined in the continuous time and space domain, wherein their nonlinearity can provide the cognitive competence of compositionality. 2. Both natural and artificial cognitive systems should be capable of predicting the perceptual outcome for the current intention for acting on the outer world via top-down pathways, whereas the current intention is adapted by using bottom-up signals of error detected between t he prediction and the actual perceptual outcome in the action-perception cycle.
150
Emergent Minds: Findings from Robotics Ex periments
3. The underlying structure for consciousness and free will should be clarified by conducting a close examination of nonstationary characterist ics in the circular causality developed through the aforementioned top-down and the bottom-up interaction between the subjective mind and the objective world. The essence of authentic being also might be clarified via such examination of the apparent dynamic structure. The remaining chapters test these conjectures by reviewing a series of synthetic robotics experiments conducted in my laboratory. Readers should be aware that my ideas were not at all in a concrete nor complete form from the very outset. Rather, they became consolidated over time as the modeling studies were conducted. Moreover, my colleagues and I have never tried to put all of the assumed elements of the mind that we have discussed thus far into our synthetic robotic models. It was not our aim to put all available neuroscience knowledge about local functions, mechanisms, and anatomy into the brains of our tiny robots. Instead, in each trial we varied and developed “minimal brains,” so to speak, in dynamic neural network models of the RNN type. We tried neither to implement all possible cognitive functions into a particular robotic model nor to account for the full spectrum of phenomenological issues in each specific experiment. We concentrated on models and experiments with specific focuses, therforee in each new trial we added elements relevant to the focus and removed irrelevant ones. My hope is that, in reviewing the outcomes of our series of synthetic robotic studies, readers will be able to share the deep insights into the nature of the mind, especially how thought and its interaction with the world could arise, whicht I have come to in performing and reflecting on the actual experiments day- to-day. The next chapter examines how robots can lean about the outer environment by using a sensory prediction mechanism in the course of exploration. It also explores the issue of self-consciousness as related to this sensor y prediction mechanism.
7 Predictive Learning About the World from Actional Consequences
The previous chapter argued that understanding t he processes essential in the development of a subjective view of the world by way of interactive experiences within that world is crucial if we are to reconstruct the cognitive mind in another medium, such as in our neurodynamic robots. But, how exactly can robots develop such subjective views from their own experiences? Furthermore, if a robot becomes able to acquire a subjective view of the world, how does it also become aware of its own subjectivity or self? In considering these questions, this chapter reviews a set of robotics experiments in the domain of navigation learning. These experiments were conducted in a relatively simple setting more than 20 years ago in my lab, but they addressed two essential questions. The first experiment addresses the question of how the compositional representation of the world can be developed by means of the self-organization of neurodynamic structures via the accumulated learning of actional experiences in the environment. The second experiment inquires into the phenomenology of the “self” or self- consciousness. I attempt to clarify its underlying structure by examining the possible interaction between top- down prediction and bottom- up recognition during robot navigation.
151
152
Emergent Minds: Findings from Robotics Exper iments
7.1. Development of Compositionality: The Symbol Grounding Problem In the mid-1990s, I started to think about how robots could acquire their own images of the world from experiences gathered while interacting with their environment (Tani, 1996). Because humans can mentally generate perceptual images for various ways of interacting with the world, I wondered if robots could also develop a similar competence via learning. As my colleagues and I had just completed experiments on robot navigation learning with homing and cyclic routing, as described in chapter 5, I decided to pursue this new problem in the context of robot navigation. First, I tried to apply the forward dynamics model proposed by Masao Ito and Mitsuo Kawato (see chapter 4) directly to my Yamabico robot navigation problem. I thought that a recurrent neural network (RNN) would work as a forward dynamics model that predicts how the sensation of images at range changes in response to arbitrary motor command inputs for two wheel drives at every 500- ms time interval. However, achieving the convergence of learning with the sensory– motor data acquired in the srcinal workspace proved to be very difficult. The reason for this failure was that it was just asking too much of the network to learn to predict the sensory outcomes for all possible combinations of motor commands at each time step. Instead, it seemed reasonable to assume that the trajectory of the robot should be generated under the constraint of smooth, collision-free maneuvering. From this assumption, I decided to employ the scheme of branching with collision-free maneuvering shown in section 5.6 again. This branching scheme enables the robot to move along “topological” trajectories in a compositional way by arbitrarily combining branching decisions in sequence. By utilizing scheme, the problem could beof simplified to one wherein an RNNthis learns to predict just the sensation the next branching point in response to action commands (branching decision) at the current branching point. I speculated that the RNN could acquire compositional images while traveling around the workspace by combining various branching decisions, provided that the RNN had already learned a sufficient number of branching sequences in the topological trajectories. A focal question that the experiment was designed to address was this one: What happens when the prediction differs from the actual outcome of the sensation? In this situation, a robot navigating a workspace
Predictive Learning About the World from Actional Consequences 153
by referring to an internal map with a finite state machine (FSM)- like representation of the topological trajectories would experience the symbol grounding problem (see Figure 2.2), discussed in chapter 2. 7.1.1 Navigation Experiments with Yamabico In the learning phase, the robot explores a given environment containing obstacles by taking random branching decisions. Let’s assume that the robot arrives at the nth branching point, where it receives sensory input (range image vector plus travel distance from the previous branch point) pn and randomly determines the branching (0 or 1) as xn, after which it moves to the (n+1)st branching point (see the left side of Figure 7.1). The robot acquires a sequence of pairs of sensory inputs and actions ( pn, xn) throughout the course of exploring its environment. Using these sample pairs of sensory inputs and actions, the RNN is trained so that it can predict the next sensory input pn+1 in terms of the current sensory input pn and the branching action xn taken at branching point n (see the right panel in Figure 7.1). In this predictive navigation task, the context units in the RN N play the role of storing the cur rent state in the work ing memory, which is analogous to the previous Yamabico experiment described in chapter 5. The actual trai ning of the R NN is conducted in an offline manner with the sample sequence data saved in short-term memory storage. Once the RNN is trained, it can perform two types of prediction. One is the onli ne prediction of the sensory inputs at t he next branching Prediction of sensation at next branch Context loop Pn+
C
p3
n+
x3 x2
p2
x p
Pn
Sensation at current branch
Xn
Cn
Current Branch decision
Figure 7.1. An RNN learning to predict sensation at the next branching point from the cur rent branching decision.
154
Emergent Minds: Findings from Robotics Experi ments
point for an action taken at the current branch. The other is the offline look-ahead prediction for multiple branching steps while the robot stays at a given branching point. Look-ahead prediction is performed by making a closed loop between the sensory prediction output units and the sensory input units of the RNN, as denoted with a dotted line in Figure 7.1. In the forward dynamics of an RNN with a closed sensory loop, arbitrary steps for look-ahead prediction can be taken by feeding the current predictive sensory outputs as sensory inputs in t he next step instead of employing actual external sensory inputs. This enables the robot to perform the mental simulation of arbitrary branching action sequences as well as goal-directed planning to achieve given goal states, as described later. After exploring the workspace for about 1 hour (see the exact trajectories shown in Figure 7.2 a) and undergoing offline learning for one night, the robot’s performance for online one-step prediction was tested. In the evaluation after the learni ng phase, the robot was tested for its predictive capacity during navigation of the workspace. It navigated the workspace from arbitrarily set initial positions by following an arbitrary action program of branching and tried to predict the upcoming sensory inputs at the next branching point from the sensory inputs at the current branching point. Figure 7.2 b presents an instance of this process, wherein the left panel shows the trajectory of the robot as observed and the right panel shows a comparison between the actual sensory sequence and the predicted one. The figure shows the nine steps of the branching sequence; the leftmost five units represent sensory input, the next five units represent the predicted state for the next step, the following unit is the action command (branching into 0 or 1), and the rightmost four units are the context units. Although, initially, the robot could not make correct predictions, it became increasingly accurate after the fourth step. Because the context units were initially set randomly, the prediction failed at the very beginning. However, as the robot continued to travel, sensory input sequences “entrained” context activations into the normal/steady-state transition sequence , afte r which the RN N became capable of producing correct predictions. We repeated this experiment with various initial settings (different initial positions and different action programs) and the robot always started to produce correct predictions with in 10 branch steps. W e also found that although the context was easily lost when perturbed by strong noise in the sensory input (e.g., when the robot failed to detect a
Predictive Learni ng About the World from Actional Consequences
155
(a)
(b)
sensory sequence
one-step prediction sequence
p e t s g in h c n a r b
start
p sensory sequence
(c)
p
x
c
look-ahead prediction sequence
p e t s
start
g in h c n a r b
p
p
x
c
Figure 7.2. Trajectories of Yamabico (a) during exploration, (b) during online one- step prediction ( left ) and comparison between the actual sensory sequence and its corresponding one-step prediction ( right), and (c) generated after of fline look-ahead prediction (left ) and comparison between an actual sensory sequence and its look-ahead prediction ( ). Adopted from Tani right (1996) with permission.
branch and ended up in the wrong place), the prediction accuracy was always recovered as long as the robot continued to travel. This autorecovery feature of the cognitive process is a consequence of the fact that a certain coherence in terms of the close matching between the internal prediction dynamics and the environment dynamics emerges during their interaction.
156
Emergent Minds: Findings from Robotics Exper iments
Once the robot was “situated” in the environment by the entrainment process, it was able to perorm multistep look-ahead prediction from branching points. A comparison between a look-ahead prediction and the actual sensory sequence during travel is shown in Figure 7.2c. The arrow in the workspace in the left panel of the figure denotes the branching point where the robot performed look-ahead prediction for an action program represented by the branching sequence 1100111. The robot, after conducting look-ahead prediction, actually traveled following the action program, generating a figure-8 trajectory. The right panel in the figure shows a comparison between the actual sensory input sequence and its look-ahead prediction associated with the action program and the context activation sequence. It can be seen that the look-ahead prediction agrees with the actual sequence. It is also observed that the context values as well as the prediction of sensory input at the initial and final steps are almost the same. This indicates that the robot predicted its return to the initial position at the end step in its “mental” simulation for traveling along a figure-8 trajectory. We repeated this experiment of look-ahead prediction for various branching sequences and found that the robot was able to predict sensory sequences correctly for arbitrary action programs in the absence of severe noise affecting the branching sequence. Finally, the robot was instructed to generate action plans (branching sequences) for reaching a particular goal (position) specified by a sensory image. In the planning process, the robot searched for adequate action sequences that could be used to reach the target sensory state in the look-ahead prediction of sensory sequences from the current state while minimi zing est imated travel distance to t he goal. Figure 7.3 shows the result of one particular trial. In this example in Figure 7.3, the robot generated three different action plans, each of which was actually executed. The figure shows the three corresponding t rajectories successfully reaching a given goal from a star ting position in t he adopted workspace. Although the t hird trajectory might look redundant due to the unnecessary loop, the creation of such trajectories suggests that a sort of compositional mechanics in the forward dynamics of the RNN had developed as a result of consolidation learning. This self- organized mechanism enabled the robot to generate diverse navigational plans as if segments of images obtained during actual navigation were combined by following acquired rules. Some may consider that the process of the goal-direct plan by the RNN is analogous to the one by GPS described in section 2.2, because
Predictive Learning About the World from Actional Consequences 157 (b)
(a)
start goal
goal
(c)
goal
start
(d)
start
goal
start
Figure 7.3. The result of goal-directed planning. Trajectories corresponding to three different generated action programs are shown. Adopted from Tani (1996) with permission.
the forward prediction of the n ext sensor y state for actions to be taken at each situation of the robot by the RNN seems to play the same role as that of the causal rule described for each situation in the problem space in GPS. However, there are crucial differences between the former functioning in a continuous state space and the latter in a discrete state space. We will come to understand the significance of these dif ferences through the following analysis. 7.1.2 Analysis of the Acquired Neurodynamic Structure After the preceding experiment, I t hought that it would be interesting to see what sorts of attractors or dynamical st ructures emerged as a result of self-organization in the RN N and its coupling with the environment, as well as how such attractors could explain the observed phenomena, such as the look-ahead prediction of combinatorial branching sequences and the autorecovery of i nternal contexts by environmental entrainment. Therefore I conducted a phase-space analysis of the obtained RNN to
Emergent Minds: Findings from Robotics Ex periments
158
examine its dynamical structure, as shown for the Rössler attractor in chapter 5. One difference was that time integration by forward dynamics of the RNN required feeding ex ternal inputs in the form of branching action sequences into the network. Therefore, the RNN in the closedloop mode was dynamically activated for thousands of steps while being fed random branching sequences (1s and 0s). Then, the activation values of two representative context units were plotted for all steps, in which the transient part corresponding to the first several hundred steps was excluded. It was like looking at trajectories from the mental simulation of thousands of consecutive steps of random branching sequences in the workspace while ignoring the initial transient period of state t ransitions. The resultant plot can be seen in Figure 7.4. We can see a set of segments (Figure7.4a). Moreover, a magnification of a particular segment shows an assembly of points resembling a Cantor set (Figure 7.4b). The plot represents the invariant set of a global attractor, as the assembly appears in the same shape regardless of the initial values of the context units or the exact sequences of randomly determined branching sequences. This means that the context st ate initialized wit h arbitrary values always converged toward steady-state transitions within the invariant set after some transient period. It was found that, after convergence was reached, the context state shifted from one segment to another at each step, and moreover it was found that each segment corresponded to a particular branching point. Additionally, an analysis
(a) .0
(b)
2 t x e t n o c
2 t x e t n o c
0 .5
0.0
0.95
0.0
0.5 context-
.0
0.8
0.4
0.55 context-
Figure 7.4. Phase space analysis of the trained RNN. (a) An invariant set of an attractor appeared in the two- dimensional context activation space. (b) A magnification of a section of the space in (a). Adopted from Tani (1996) with permission.
Predictive Learning About the World from Actional Consequences 159
of the aforementioned experiments for online prediction revealed that, whenever the predictability of the robot was lost due to perturbations, the context state left the invariant set. However, the perturbed context state always returned to the srcinal invariant set af ter several branching steps because the invariant set had been generated as a global attractor. Our repeated experiments with different robot workspace configurations revealed that the observed properties of the RNN are repeatable and therefore general. 7.1.3 Is the Problem of Symbol Grounding Relevant? Given that the context state shifted from one segment to another in the invariant set in response to branching inputs, we can consider that what the RN N reproduced in th is case was exactly an FSM consist ing of nodes representing branching points and edges corresponding to transitions between these points, as shown in Figu re 2.2. Thi s is a nalogous to what Cleeremans and colleagues (1989) and Pollack (1991) demonstrated by training RNNs with symbol sequences characterized by FSM regularities. Readers should note, however, that the RNNs achieve much more than just reconstruct ing an equivalent of the target F SM. First, each segment observed in the phase space of the RNN dynamics is not a single node but a set of points, namely a Cantor set spanning a metric space. The distance between two points in a segment represents the dif ference between past trajecto ries arr iving at the node. If the two trajectories come from different branching sequences, they arrive at points in the segment that are also far apart. On the other hand, if the two trajectories come from exactly the same branching sequences after passing through an infinite number of steps except for the initial branching points, they arrive at arbitrarily close neighbors in the same segment. Theoretically speaking, a set of points in the segment constitutes a Cantor set with fractal- like structu res because this infi nite number of points should be capable of representing the history of all possible combinations of branching (this can be proven by taking into account the theorem of iterative function switching [Kolen, 1994] and random dynamical systems [Arnold, 1995]). This fractal structure is actually a signature of compositionality, which has appeared in the phase space of the RNN by means of iterative random shifts of the dynamical system triggered by given input sequences of random branching. Interestingly, Fukushima and colleagues (2007) recently showed supportive biological
160
Emergent Minds: Findings from Robotics Ex periments
Figure 7.5. The dynamic closure of steady-state transitions organized as an attractor ( solid arrows ) associated with a convergent vector flow (dashed arrows).
evidence from electrophysiological recording data that CA1 cells in the rat hippocampus encode sequences of episodic memory with a similar fractal structure. Second, the observed segments cannot be manipulated or represented explicitly as symbols attached to nodes in an FSM. They just appear as a dynamic closure1 as a result of the convergent dynamics of the RNN. The nature of global convergence of the context state toward steadystate transitions within the invariant set as a dynamic clo sure can afford global stability to the predictive dynamics of the RNN. An illustration of this concept appears in Figure 7.5. On the other hand, in the case of an FSM, there is no autorecovery mechanism against perturbations in the form of invalid inputs because the FSM provides only a description of steady-state transitions within the graph and cannot account for how to recover such states from dynamically perturbed ones. As mentioned earlier, when an FSM receives invalid symbols (e.g., unexpected sensations during navigation), it simply halts operation. The discussion here is analogous to that in chapter 5 about the advantages of utilizing dissipative dynamic systems rather than sinusoidal functions for stably generating oscillatory patterns.
1. It is called a dynamic closu re because the state shi fts only between points in the set of segments in t he invariant set (Matu rana & Varela, 1980).
Predictive Learning About the World from Actional Consequences 161
It is essential to understand that there is no homunculus that looks at and manipulates representations or symbols in the proposed approach. Rather, there are just iterations of a dynamical system whereby compositionality emerges. Ultimately, it can be said that this system is not affected by the symbol grounding problem because there are no “symbols” to be grounded to begin with, at least not internally. Before moving on, I should mention some drawbacks to this approach. The current scheme utilizing the forward model is limited to smallscale problems because of the frame problem discussed in section 7.1. The model worked successfully because the navigation environment was small and the branching scheme preprogramed in the lower level simplified the navigation problem. Although how robots can acquire the lower sensory– motor level skills such as branching or collision-free maneuvering from their own direct sensory– motor experiences is quite an important problem, we did not address the problem in this study. Another problem concerns the intentionality of the robot. What the experiment showed is so-called latent learning in which an agent learns an internal model of the environment via random exploration without any intentions. If the robot attempts to learn about all possible exploration experiences without any intentions or goals to achieve, such learning will face the combinatory explosion problem sooner or later. We return to these issues in the later sections. The next section explores how sensory prediction learning and phenomena of self-consciousness could be related, by reviewing results of another type of robot navigation experiment.
7.2. Predictive Dynamics and Self-Consciousness This section examines how the notion of “self” or self- consciousness could emerge in artificial systems as well as in human cognitive minds through the review of further robotics experiments on the topics of prediction learning in navigation as extended from the aforementioned Yamabico ones. The following robotics experiments clarify the essential role of sensory prediction mechanisms in the possible development of self-consciousness as presumed in the earlier chapters. Although the experiments with Yamabico described in the previous section revealed some interesting aspects of contextual predictive dynamics, they still miss some essential features, one of which is the
162
Emergent Minds: Findings from Robotics Ex periments
utilization of prediction error signals. The error signal is considered to be a crucial cue for recognizing a gap between the subjective image and objective reality. Recent evidence from neuroscience has revealed brain waves related to prediction error, as in the case of mismatched negativity, and it is speculated that they are used for fast modification of ongoing brain processes. Also, Yamabico did not have a particular bias or attention control for acquiring sensory input. It would naturally be expected that the addition of some attention control mechanism would reinforce our proposed framework of top-down prediction– expectation versus bottom- up recognition. Therefore, we introduced a visual system with an attention control mechanism in the robot platform that succeeded Yamabico. Finally, it would be interesting to incorporate such a system with dynamic or incremen tal learni ng of experiences rather than looking at the result of one-time offline “batch” learning, as in the case of Yamabico. Our findings in these robotics experiments enriched with these new elements suggest a novel interpretation of concepts such as the momentary self and minimal self, which correspond to ideas developed by William James (1982) and Martin Heidegger (1962). 7.2.1 Landmark-Based Navigation Performed By a Robot with Vision I built a mobile robot with vision provided by a camera mounted on a rotating head, as shown in Figure 7.6a (Tani, 1998). The task of this robot was to learn to dynamically predict landmark sequences encountered while navigating a confined workspace. After a successful learning process, the robot was expected to be able to use its vision to recognize landmarks in the form of colored objects and corners within a reasonable amount of time before colliding with them, while navigating the workspace by following the wall and the edge between the wall and the floor. It should be noted that the navigation scheme did not include branching as in the case of Yamabico, because the learning of compositional navigational paths was not the focus of research in t his robot study. The robot was controlled by the neural net work architect ure shown in Figure 7.6 b. The entire network consisted of parts responsible for prediction (performed by an RN N) and part s responsibl e for perception, the latter bei ng divided into “what” and “where” pathw ays thereby mimicking known visual cortical structures. In the “what” pathway, visual patterns of landmarks corresponding to colored objects were
(a)
(b) Prediction by RNN
sensory prediction
sensory input
Association network tl tex con
oop
categorica ‘where’ ‘what’
categorical output left
winner take all neurons
wh eel
Hopfield net
pop-up
visual field
camera
Figure 7.6. A vision-enabled robot and its neural architecture. (a) A mobile robot featuring vision is looking at a colored landma object. (b) The neural network architecture employed in the construction of the robot. Adopted from Tani (1998) with permis
164
Emergent Minds: Findings from Robotics Experi ments
processed in a Hopfield network, which can store multiple static pattern s by using multiple fixed- point attractors. When a perceived visual pattern converged toward one of the learned fixed- point attractors, the pattern was recognized and its categorical output was generated by a winner- takes- all activation network, known as a Kohonen network. Learning was initiated for both the Hopfield and Kohonen networks whenever a visual stimulus was encountered. In the “where” pathway, accumulated encoder readings of the left and right wheels from the last encountered landmark to the current one, and the directions of the detected landmarks in frontal view, were processed by the Kohonen network, in which its categorical outputs were generated. Together with both pathways, “what” categories of visual landmark objects and “where” categories of the relative travel distance from the last landmark to the current one, as well as the corresponding direction determined by the camera orientation, were sent for prediction in a bottom- up manner. In the prediction process, the RNN lear ned to predict in a top- down manner the perceptua l categories of “what” and “where” for landmarks to be encountered in the f uture. Note that there were no act ion inputs in this RNN because there was no branching in the current setting. In this model, the bottom- up and top- down pathways did not merely provide inputs and outputs to t he system. Rather, they existed for thei r mutual interactions, and the system was prepared for expected perceptual categories in the top- down pathway before actually encountering the landmarks. Th is expectat ion ensured that the system was ready for the next arriving pattern in the Hopfield network and was prepared to direct the camera toward the landmark with correct timing and direction. Actual recognition of the landmark objects was established by dynamic interactions between the two pathways. This means that if the top- down prediction of the visual pattern failed to match the currently encountered one, the perception would result in an illusion constituti ng a combination of the t wo patterns. Moreover , a mismatch in the “where” perceptual category could result in failure to attend any of the expected landmarks to be recognized. Such misrecognition outcomes were fed into the RNN, and the next prediction was made on this basis. Note that the RNN was capable of engaging in “mental rehearsal” of learned sequential images by constructing a closed loop between the prediction outputs and the sensation inputs in the same way as Yamabico.
Predictive Learni ng About the World from Actional Consequences
165
A particular mechanism for internal parameter control was implemented to achieve adequate interactive balance between the top-down and bottom- up pathways. The mechanism exerted more top-down pressure on the two perceptual categories (“what” and “where”) as the error between the predicted perception and its actual outcome decreased. A shorter ti me period was also al located for reading the perceptual outcomes in the Hopfield network in this case. On the other hand, less top- down pressure was exerted when the er ror between the predicted perception and its actual outcome was larger, and a longer time period was allowed for dynamic perception in the Hopfield network. In other words, in the case of fewer errors, top-down prediction dominated the perception, whereby the attention was quickly turned to upcoming expected landmarks, which resulted in quick convergence in the Hopfield network. Otherwise, the bottom- up pathway dominated the perception, taking longer to look for landmarks while waiting for convergence in the Hopfield network. Learning of the RNN was conducted for event sequences associated with encountering landmarks. More specifically, experienced sequences of perceptual category outcomes were used as target sequences to be learned. Incremental training of the RNN was conducted after every 15th landmark by adopting a scheme of rehearsal and consolidation, so that phenomena such as “catastrophic forgetting” could be avoided. RNNs lose previously learned memory content quite easily when new sequences are learned, thereby altering acquired connection weights. Therefore, in the new scheme, the RNN “rehearsed” previously learned content with the closed-loop operation and stored the generated sequences in the “hippocampus” (corresponding to short-term memory) together with the newly acquired sequences, and catastrophic forgetting of existing memory was avoided by retraining the RNN with both the rehearsed sequences and the newly experienced ones. This rehearsal and consolida tion might correspon d to d reaming during the REM sleep phase reported in the literature on consolidation learning (Wilson & McNaughton, 1994; Squire & Alvarez, 1995). It has been considered that generalization of our knowledge proceeds significantly through consolidating newly acquired knowledge with older knowledge during sleep. Our robot actually stopped for rest when this rehearsal and consolidation learning was taking place after every fixed period. However, in reality the process would not be so straightforward as thi s if t he rehearsed and the newly ac quired experiences conflict with
166
Emergent Minds: Findings from Robotics Experi ments
each other. One of the aims behind the next experiment I will describe was to examine this point. 7.2.2 Intermittency During Dynamic Learning The experiment was conducted in a confined workspace containing five landmarks (two colored objects and three corners). It was repeated three times, and in each trial the robot circulated the workspace about 20 times, which was a limit imposed by the battery life of the robot. We monitored three characteristic features of the robot’s navigation behavior in each run: prediction error, bifurcation of the RNN dynamics due to iterative learning, and phase plots representing the attractor dynamics of the RN N at particular t imes during the bifurcation process. A typical example is shown in Figure 7.7a. The prediction error was quite high at the beginning of all trials because of the initially random co nnection weights. Af ter the fir st learning period, the predictability was improved to a certain extent in all three trials, but the errors were not eliminated completely. Prediction failures occurred intermittently in the course of the trials, and we can see from the bifurcation diagram that the dynamical structure of the RNN varied. In a typical example, shown in Figure 7.7a, a fixed-point attractor appearing in the early periods of the learning iterations as a single point is plotted at each step in the bifurcation diagram, in most cases before the third learning period. Af ter the third learn ing period, a quasiperiodic or weakly chaotic region appears. Then, after the fourth learning period, it becomes a limit cycle with a periodicity of 5, as can be seen from the five points plotted in the bifurcation diagram at each step during this period. In addition, a snapshot is shown in the phase plot containing five po ints. Af ter the fif th learning period, a highly chaotic region appears, as indicated by the strange attractor in the corresponding phase plot. Importantly, the state alternates between the strange attractor (chaos) and the limit cycle attractor with a periodicity of 5. In fact, limit-cycle dynamics with a periodicity of 5 appeared most frequently in the course of all t rials. A periodicity of 5 is indicative because it corresponds to the five landmarks that the robot encountered in a single turn around the workspace. Indeed, the five points represent a dynamic closure for the steady- state transitions between these five landmarks. However, it should be noted that this limit cycle with a periodicity of 5
Predictive Learning About the World from Actional Consequences 167 (a)
r o r r e n io t c i d e r p
0.5
0 0
5
30
45
60
75
0 5
90
event steps
l a r u e n
e t a t s n o i t a iv t c a
.0
0. 5
0. 0
0
2345 learning times
7
5
4 c
7
c
c2 (b) Unsteadyphase
6
c
c2
c2
Steadyphase
Figure 7.7. Experimental results for a vision robot. (a) Prediction error, bifurcation diagram of t he RNN dynamics, and phase plot for two context units at par ticular ti mes during the learning process. ( b) The robot’s trajectories as recorded in the u nsteady and steady phases. Adopted from (Tani, 1998) with permiss ion.
does not remain st ationary, because the periodicity disappears at times and other dynamical structu res emerge. The dynamic closure observed in the current experiment is not stable but changes in the course of dynamic learning. From the view of symbolic dynamics (see chapter 5), this can be interpreted as the robot could mentally simulate various
168
Emergent Minds: Findings from Robotics Ex periments
symbolic sequence structures for encountering landmark labels including deterministic symbol sequences of a period of 5 and the ones with probabilistic state t ransitions during the rehearsal. From these results, we can conclude that there were two distinct phases: a steady-state phase represented by the limit- cycle dynamics with a periodicity of 5, and an unsteady phase characterized by nonperiodic dynamics. We also see that transitions between these two phases took place arbitrarily over the course of time, and that differences appeared in the physical movements of the robot concurrently. To clarify why this happened, we compared the actual robot trajectories observed in these two periods. Figure 7.7b shows the robot trajectories measured in these two periods with a camera mounted above the workspace. The trajectory was more winding in the unsteady phase than in the steady phase, particularly in the way objects and corners were approached. From this it was inferred that the robot’s maneuvers were more unstable in the unsteady phase because it spent more time on the visual recognition of objects due to the higher prediction error. So, the robot faced a higher risk of misdetecting landmarks when its trajectory meandered during this period, which was indeed the case in the experiments. In t he steady phase, however, the detection sequence of landmarks became more deterministic and travel was smooth, with greater prediction success. What is important here is that these steady and unsteady dynamics are attributable not only to the internal cognitive processes arising in the neural network, but also were expressed in the physical movements of the robot’s body as it interacted with the external environment. Finally, we measured the distribution of interval steps between catastrophic error peaks (error >0.5) observed in three different experiments of the robot (Figure 7.8). The graph indicates that the distribution of the breakdown interval has a long-tail characteristic with a power- law-like profile. This indicates that the shift from the steady to the unsteady phase takes place intermittently, without dominant periodicity. The observed intermittency might be due to the tangency developed in the whole dynamics (see section 5.1.). The observation here might be also analogous to the so-called phenomenon of chaotic itinerancy (Tsuda et al., 1987; Ikeda et al., 1989; Kaneko, 1990; Aihara et al., 1990) in which state trajectories tend to visit multiple pseudoattractors one by one itinerantly in a particular class of networks consisting of dynamic elements. Tsuda and colleagues (1987) showed that intermittent chaos mechanized by means
Predictive Learning About the World from Actional Consequences 169 6
s) e im T ( y c n e u q e r F
8
4
2
4
6 Interval (Steps)
64
28
Figure 7.8. Distribution of interval steps between catastrophic prediction error peaks greater than 0.5, where the x axis represents the interval steps and the y axi s represents the frequency of appearance in the corresponding range, and both axes are in log scale.
of tangency in nonlinear mapping (see section 5.1) generated the chaotic itinerancy observed in his memory dynamics model. The robotics experiment described in thi s section has demonstrated that phenomena similar to chaos itinerancy could also emerge in the learning dynamics of a network model coupled with a physical environment. The dynamic learning processes while i nteracting with the outer environments can generate complex trajectories that alternate between stabilizing t he memory contents and their breakdown. 7.2.3 Accounting for the “Minimal Self ” An interesting observation from the last experiment is that the transitions between steady and unsteady phases occurred spontaneously, even though the workspace environment was static. In the steady phase, coherence is achieved between the internal dynamics and the environmental dynamics when subjective anticipation agrees closely with observation. All the cognitive and behavioral processes proceed smoothly and automatically, and no distinction can be made between the subjective mind and the objective world. In the unsteady phase, this distinction becomes rather explicit as conflicts in terms of the prediction error are generated between the expectations of the subjective mind and the outcome generated in the objective world. Consequently, it is at this moment of incoherence that the “self-consciousness” of the robot arises, whereby
170
Emergent Minds: Findings from Robotics Experi ments
the system’s attention is directed toward the conflicts to be resolved. On the other hand, in the steady phase, the “self-consciousness” is reduced substantially, as there are no conflicts demanding the system’s attention. This interpretation of the experimental observations corresponds to the aforementioned analysis in Heidegger’s example of the hammer missing the nail (see section 3.4) as well as in James’ concept of the stream of consciousness (see section 3.6) in which the inner stream consists of transient and substantive parts, and the self can become consciously aware momentarily in the discrete event of breakdown. With reference to the Scottish philosopher David Hume, Gallagher (2000) considered that this “momentary self ” is in fact a “minimal self,” which should be distinguished from the self-referential self or narrative-self provided with a past and a future in the various stories that we tell about ourselves. However, one question still remains for us to address here: Why couldn’t the coherence in the steady phase last longer and the breakdown into incoherence take place intermittently? It seems that the complex time evolution of the system emerged from mutual interactions between multiple local processes. It was observed that changes in the visual attention dynamics due to changes in the predictability caused drifts in the robot’s maneuvers. These drifts resulted in misrecognition of upcoming landmarks, which led to modification of the dynamic memory stored in the RNN and a consequent change in predictability. Dynamic interactions took plac e as chain reactions with certa in delays among the processes of recognition, prediction, perception, learning, and acting, wherein we see the circular causality between the subjective mind and the objective world. So, this circular causality might then provide a condition for developing a certain criticality . The aforementioned circular causality can be explained more intuitively as follows. When the learning error decreases as learning proceeds, more strict ti ming of visual recognition is required for upco ming landmarks because only a short period for recognition of the objects is allowed, which is proportional to the magnitude of the current error. In addition, the top- down image for each upcoming landmark pattern is shaped into a fixed one, without variance. This is because the same periodic patterns are learned repeatedly and the robot tends to trace exactly the same trajectories in the steady phase. If all goes completely as expected, this strictness grows as the prediction error decreases further. Ultimately, at the peak of strictness, catastrophic failure in the recognition of landmark sequences can occur as a result of even minor
Predictive Learning About the World from Actional Consequences 171
noise perturbation because the entire system has evolved too rigidly by building up relatively narrow and sharp top-down images. The described phenomena remind me of a theoretical study conducted on sand pile behavior by Bak and colleagues (1987). In their simulation study, grains of sand were dropped onto a pile, one at a time. As the pile grew, its sides became steeper, eventually reaching a critical state. At that very moment, just one more grain would have triggered an avalanche. I consider that this critical state is analogous to the situation of generating catastrophic failures in recognizing the landmarks in the robotics experiment. Bak found that although it is impossible to predict exactly when the avalanche will occur, the size of the avalanches is distributed in accordance with a power law. The natural growth of the pile to a critical state is known as self-organized criticality (SOC), and it is found to be ubiquitous in various other phenomena as well, such as earthquakes, volcanic activity, the Game of Life, landscape formation, and stock markets. A crucial point is that the evolution toward a certain critical state itself turns out to be a stable mechanism in SOC. It is as if a critical situation such as “tangency” (see section 5.1) can be preserved with structural stability in the system. This seems to be possible in the system with relatively larger dimensions allowing local nonlinear interactions inside (Bak et al., 1987). Although we might need a larger experimental dataset to confirm the presence of SOC in the observed results, I speculate that some dynamic mechanisms for generating criticality could be responsible for the autonomous nature of the “momentary self,” which James metaphorically spoke of as an alternation of periods of flight and perching throughout a bird’s life. Here, the structure of consciousness responsible for generating the momentary self can be accounted for by emergent phenomena resulting f rom the aforementio ned circular causality. Incidentally, readers may wonder how we can appreciate a robot with such fragility in its behavior characterized by SOC— the robot could “die” by crashing into the wall due to a large fluctuation at any moment. I argue, however, that the potential for an authentic robot arises from this f ragility (Tani, 2009), rememb ering what Heidegger said about the authentic being of man, who resolutely anticipates death as his ownmost possibility (see section 3.4). By following Heidegger, the vivid “nowness” of a robot might be born in this criticality as a consequence of the dynamic interplay between loo king ahead to the f uture for possibilities and regressing to t he conflictive past through reflection. In this, t he
172
Emergent Minds: Findings from Robotics Exper iments
robot may ultimately achieve authentic being in terms of its irreplaceable behavioral trajectories. Finally, we may ask whether the account provided so far could open a new pathway to access the hard problem of consciousness characterized by Chal mers (see section 4.3) or not. I would say “yes” by obser ving the following logic . The top- down pathway of predict ing perceptual event sequences exemplifies subjectivity because it is developed solely along with the first- person experiences of perceptual events accumulated through iterative interactions i n the objective worl d. Subjectivity is not a state but a dynamic function of predicting the perceptual outcomes resulting from interactions with the objective world. If this is granted, the consciousness that is the first- person awareness of one’s own subjectivity can or iginate only from a sense of discomfort in one’s own predictability— that is the pred iction error, 2 which is also the firstperson experience but in another level of the second order (where the contents of predict ion are the firs t order). Subjectivit y as a mir ror of the objective world cannot be aware just by itself a lone. It requires dif ferentiation from t he objective world as another pole by means of interacting with it. To this end, the subject and the object tu rn out to be an i nseparable entity by means of the circular causality between them wherein the open dynamics characterized by intermittent transitions between the predictable steady phase and t he conflictive unsteady one emerges. And as such, this interpretation of experimental results reviewed in this chapter provides insight into the fundamental structure of consciousness, rather than merely into a particular state of consciousness or unconsciousness at a moment.
7.3. Summary This chapter introduced two robotics experiments on the topics of prediction learning in the navigation domain by utilizing mobile robots with a focus on how robots can acquire subjective views of the external world through iterative interactions with it. The first experiment focused on the problem of learning to extract compositionality from
2. Recently, Karl Friston (2010) proposed that a likelihood measure by prediction error divided by an estimate of its variance can represent the “surprise” of the system. This measure might quantify the state of consciousness better than simply error itself.
Predictive Learning About the World from Actional Consequences 173
sensory– motor experiences and their grounding. The experimental results using the Yamabico robot showed that the compositionality hidden in the topological trajectory in the obstacle environments can be extracted by t he predictive model inst antiated by RNN. T he navigation of the robot became inherently robust because the mechanism of autorecovery was supported by means of the development of the global attractor in the RNN dynamics. We concluded that symbol-like structures self-organized in neurodynamic systems can be naturally grounded in the physical environment by allowing active interactions between them in a shared metric space. The second experiment addressed the phenomenological problem of “self ” by fur ther extendi ng the aforementioned robot navigation experiments. In this new experiment, a vision- based mobile robot implemented with an R NN model learns to predict landmark sequenc es experienced during its dynamic exploration of environment. It was shown that the developmental learning process during the exploration switches spontaneously between coherent phases (when the top-down prediction agrees with the bottom- up sensation) and incoherent phases (when conflicts appear between the two). By investigating possible analogies between this result and the phenomenological literature on the self, we drew the conclusion that the open dynamic structure characterized by SOC can account for the underlying structure of consciousness through which the “momentary self” appears autonomously. It is interesting to note that, although I emphasized the grounding of the subjective image of the world in the first navigation experiment, the second experiment suggested t hat the momen tar y self could appear instead in the sense of the groundlessness of subjectivity. The apparent gap between these two has srcinated from two different research attitudes for exploring cognitive minds, which are revisited in later chapters. One drawback of the models presented for robot navigation in t his chapter is t hat the models could not provide direct experience of perceptual flow to the robots because the model operated in an event- based manner that was designed and programmed by the experimenters. The next chapter introduces a set of robotics experi ments focusing on mirror neuron mechanisms in which we co nsider how event- like perception develops out of the continuous flow of perceptual e xperience, as related to phenomen ological probl em of time perception.
8 Mirroring Action Generation and Recognition with Articulating Sensory– Motor Flow
In the physical word, everything changes continuously in time like a river flows. Discontinuity is just a special case. Sensory- motor states change continuously and neural activation states in essential dimensions do so, too as Churchland observed (2010; also see section 4.3). If this is granted, one of the most difficult questions in understanding the sensory– motor system should be how continuous sensory–motor flows can be recognized as well as generated structurally, that is, recognized as segmented into “chunks” as well as generated with ar ticulation. According to the motor schemata theory proposed by Michael Arbib (1981), a set of well-practiced motor programs or primitives are stored in long-term memory, and different combinations of these programs in space and time can generate a variety of motor actions. Everyday actions, such as picking up a mug to drink some coffee can be generated by concatenating different chunks or behavioral schemes, namely those of the vision system attending to the mug, the hand approaching the handle of the mug in the next chunk, followed by the hand gripping the handle in the final chunk. Similarly, Yasuo Kuniyoshi proposed that complex
175
176
Emergent Minds: Findings from Robotics Exper iments
human actions can be rec ognized by their st ructurally segmenting visual perceptual flow of concatenated reusable patterns (Kuniyoshi et al., 1994). Kuniyoshi and colleagues (2004) also showed in a psychological experiment that recognition of timing of such segmentation is essential to ext ract crucial information about the action observed. The problem of segmentation is closely related also to the aforementioned phenomenological problem of time perception considered by Husserl, which is concerned with the question of how a flow of experiences in the preempirical level can be consciously recalled in the form of articulated objects or events at the objective time level (section 3.2). Please note that we did not address this problem in the previous experiments with the Yamabico robot because segmentation of sensory flows was mechanized by the hand-coded program for branching. Yamabico received sequences of discontinuous sensory states at each branching point. In this chapter, our robots have to deal with a continuous flow of sensory-motor experiences. Then, we investigate how these robots can acquire a set of behavioral schemes and how they can be used for recognizing as well as generating whole complex actions by segmenting or articulating the sensory- motor flow. I presume that mirror neurons are integral to such processes because I speculate that they encode basic behavior schemes in terms of predictive coding (Rao & Ballard, 1999; Friston, 2010; Clark, 2015) that can be used for both recognition and generation of sensory-motor patterns, as mentioned previously (see section 4.2). This chapter develops this idea into a synthetic neurorobotics model. The following sections will introduce our formulation of the basic dynamic neural network mod el for the mir ror neuron system. The formulation is followed by neurorobotics experiments utilizing the model for a set of cognitive behavior tasks including creation of novel patterns via learning a set of behavior patterns, imitative learning, and acquisition of actional concepts via associative learning between a quasilanguage and motor behaviors. The analysis of these experimental results provide us with some insight into how the interaction between the top-down prediction/generation process and the bottom-up recognition process can achieve segmentation of a continuous perceptual flow into meaningful chunks, and how distr ibuted representation schemes ado pted in the model can enhance the generalization of learned behavioral skills, knowledge, and concepts.
Mirrori ng Action Generation and Reco gnition
177
8.1. A Mirror Neuron Model: RNNPB In this section, we examine a dynamic neural network model, the recurrent neural network with parametric biases (RNNPB) that I and my colleagues (Tani, 2003; Tani et al., 2004) proposed as a possible model to account for the underlying mechanism for mirror neurons (Rizzolatti et al., 1996.) The RN NPB model adopts the distributed representation framework by way of which multiple behavioral schemes can be memorized in a single network by sharing its neural resources. This contrasts with the local representation framework in which each memory content is stored in a distinct local module network separately (Wolpert & Kawato, 1998; Tani & Nolfi, 1999; Demiris & Hayes, 2002; Shanahan, 2006). In RNNPB, the inputs of a low- dimensional static vector, the parametric bias (PB) represent the intention for action to be enacted. The RNNPB generates prediction of the perceptual sequence for the outcome of the enactment of the intended action. The RNNPB can model the mirror neuron system in an abstract sense because the same PB vector value accounts for both generation and recognition of the same action in terms of the corresponding perceptual sequenc e pattern. T his idea corresponds to the aforementioned concept about the predictive model in the parietal cortex associated with mirror neurons shown in Figure 4.6. From the viewpoint of dynamical systems, the PB vector is considered to play the role of bifurcation parameters in nonlinear dynamical systems as the PB shifts the dynamic structure of the RNN for generating different perceptual sequences. Let’s look at the detailed mechanism of the model (Figure 8.1). The RNNPB can be regarded as a predictive coding or generative model whereby different target perceptual sequence patterns, p , t = 0...l -1 can be learned for regeneration as mapped from the cort responding PB vector values. The PB vector for each learning sequence pattern is determined autonomously without supervision by utilizing the error signals back-propagated to the PB units, whereas the synaptic weights (common to all patterns) are determined during the learning process as shown in Figure 8.1a. Readers should note that the RNNPB can avoid the frame problem described in section 4.2 because the dynamic mapping to be learned is not from arbitrary actions to perceptual outcomes at each time step but from a specific set of actional
178
Emergent Minds: Findings from Robotics Exper iments
(a)
(b)
(c)
Teaching target: pt+1
Perception target: p
Error
t+1
Error
pt+1
pt+1
pt+1 e in l y a l e d
e n li y la e d
PB pt
ct Inferred by error-BP
Learning phase
PB pt
Externally set
Generation phase
PB ct
pt
ct Inferred by error-BP
Recognition phase
Figure 8.1. The system flow of a recurrent neural network with parametric biases (R NNPB) in (a) learning mode, (b) top- down generation mode where intention is set externally in the PB, and (c) bottom-up recognition mode wherein intention in the PB is inferred by utilizing the back-propagated error.
intentions to the corresponding perceptual sequences. This makes the learning process feasible because the network is trained not for all possible combinatorial trajectories but only for selected ones. After the learning is completed, the network is used both for generating (predicti ng) and recognizing perceptual sequen ces. The learned perceptual sequences can be re-generated by means of forward dynamics of the RN NPB by the PB set given with values determined in the learning process (see Figure 8.1b). This is the top- down generation process with the corresponding actional intention represented by the PB. Perceptual sequences can be generated and predicted either in the open-loop mode by receiving the current perceptual inputs from the environment, or in the closed- loop mode, wherein motor imagery is generated by feeding back the network’s own prediction outputs into the inputs (dotted line indicates the feedback loop.) On the other hand, in (c), the exper ienced perceptual sequences can be recognized by searching the optimal PB values that minimize the errors between the target sequences to be recognized and the output sequences to be generated, as shown in Figure 8.1c. This is the bottomup process of inferring the intention in terms of the PB for the given perceptual sequences. As an experiment described later shows, generation of action and recognition of the res ultant perceptual sequences
Mirrori ng Action Generation and Reco gnition
179
can be performed simultaneously. More specifically, behavior is generated by predicting change in posture in terms of proprioception, depending on the current PB, while the PB is updated in the direction of minimizing the prediction error for each coming perceptual input. By this means, the intention– perception cycle can be achieved in the RNNPB, whereby the circular causalit y between intention and perception appears. Note also that both action learning and generation are formulated as dynamic processes for minimizing the prediction error (Tani, 2003), the formulation of whi ch is analogous to the free - energy principle proposed by Karl Fri ston (20 05; 2010). Here, I should explain the learning process more precisely, because its mechanism may not n ecessari ly be intuitive. When learning is commenced, the PB vector of each train ing sequence is set to a s mall random value. The forward top- down dynamics initiated with this temporarily set PB vector generates a predictive sequence for the traini ng perceptual sequence. The error generated between t he target trai ning sequence and the output sequence is back-propagated along the bot tom-up path iterated backward through time steps via recurrent connections, whereby the connection weights are modified in the direct ion of minim izing the error signal. The error signa l is also back- propagated to the PB un its, in which their values for each training sequence are modified. Here, we see that the learning proceeds by having dense interactions between the top- down regeneration of the training sequences and the bottomup regression of the regenerated sequences utilizing the error signals. The internal s truct ures for embedding multiple beha vior schemata can be gradually developed though this type of the bottom- up and topdown interaction by self- organizing distributed representation in the network. It is also important to note that the generation of sequence patterns is not limited to trained ones. The network can create a variety of similar or novel sequential patterns depending on the values of the PB vector. It is naturally assumed that if PB vectors are similar, they would generate similar sequence patterns, otherwise they could be quite different. The investigation of these characteristics is one of the highlights in the study of the current model characterized by its distributed representational nature. The following subsections detail such characteristics of the R NNPB model by sho wing robotics experiments using it.
180
Emergent Minds: Findings from Robotics Exper iments
8.2. Embedding Multiple Behaviors in Distributed Representation A simple experiment involving learning a set of target motor behaviors was conducted to examine PB mapping, in which a structure emerges as a result of self- organization through the process of learning. PB mapping shows how points in the PB vector space can be mapped to sequence patterns to be generated after learning a set of target patterns. In this experiment, an RNNPB was trained on five different movement patterns of a robotic arm with four degrees of freedom. The five target movement patterns in terms of four- dimensional proprioceptive (joint angle) sequence patterns are shown in Figure 8.2. Teach-(1, 2, and 3) are discrete movements with different end points, and Teach-(4 and 5) are different cyclic movements. The arrows associated with those sequence patterns indicate the corresponding PB vector points determined in two- dimensional space in the training. It can be seen that the PB vectors for all three discrete movement patterns appear in the upper right region and the PB vectors for the two target cyclic movement patterns appear in the lower right region in PB space, which was found to be divided into two regions (the boundary is shown as a dotted curve), as shown in Figure 8.2. The area above the dotted curve is the region for generating discrete movements, and the remaining area under the dotted curve is for cyclic movemen t patter ns (including nonperiodic ones ). An important observation is that the character istic landscape is quite smooth in the region of di screte movemen ts, wherby if the PB vector is changed slightly, the des tination point of t he discrete movement changes only slightly. Particularly inside of the triangular region defined by these three PB points corresponding to the tra ined discrete movemen ts, the profiles of all generated sequence patterns seem to be generated by interpolations of these three t rained sequence pa tter ns. On the other hand, the character ist ic landscape in the reg ion of periodic movem ent patterns is quite rugged against changes in the PB values. The profiles of generated patterns could change drastically as compared with changes in the PB vector in this region. Patterns generated from this region could include a variety of novel pa tter ns, such as Novel- (1 and 2) shown in Figure 8 .2. Novel- 2 is a nonp eriodic pattern t hat is especially difficult to imagine as being derived from the profiles of the training patterns.
.0
n io t p e c io r p o r P
.0
.0
.0
p
0
0
20
40
60 Novel-2
80
0
00 time
0 5 0 Teach-
0
0 0 5 0 Teach-3
0 5 05 Teach-2
P[] P[3] P[4] P[2]
.0 (0.78,0.9)
p e c io r p o r P
(0.87,0.8) (0.57,0.7)
2 B P
.0
n o i t
(0.86,0.49) 0 0
20 time
(0.6,0.29)
40
Novel-
0.0
PB .0
.0
n io t p e c io r p o r P
0
.0
0
0
20 time Teach-4
30 40
0 0
0
20 time
30 40
Teach-5
Figure 8.2. Mapping from PB vector space with two- dimensional principal components to the generated movement pattern spac
182
Emergent Minds: Findings from Robotics Ex periments
One interesting observation here is that two qualitatively distinct regions appeared, namely the discrete movement part and the cyclic movement part, including nonperiodic patterns. The former successfully achieves generalization in terms of interpolation of trained sequence patterns because it might be easy to extract common structures shared by the three trained discrete movements, which exhibit fixed-point dynamics with various destination points. On the other hand, in the latter case it is difficult to achieve generalization because structures shared between the two cyclic movement patterns with different shapes, periodicities, and amplitudes are equally difficult to extract. This results in a highly nonlinear landscape in this region due to the embedding of quite different dynamic patterns in the same region. In such a highly nonlinear landscape, diverse temporal patterns can be created by changing the PB vector. The aforementioned experiment result fits very well with James’s thought (James, 1892) that when the memory hosts complex relations or connections between images of past experiences, images can be regenerated with spontaneous variations into streams of consciousness (see section 3.6). James predicted this type of phenomena without conducting any experiments or simulations but only from formal introspection. Now that we have covered the basic characteristics of the RNNPB model, the following subsections introduce a set of cognitive robotics experiments utilizing the RNNPB model with a focus on mirror neuron functions. First, the next subsection looks at the application of the RNNPB model to a robot task of imitation learning.
8.3. Imitating Others by Reading Their Mental States In section 5.2, I briefly explained about the development of imitation behavior with emphasis on its early stage in which the imitation mechanism is accounted for by simple stimulus response. I also introduced a robot study by Gaussier and colleagues (1998) that showed that robots can generate synchronized imitation with other robots using acquired visuo- proprioceptive mapping under the homeostasis principle. Rizzolatti and colleagues (2001) suggested the neural mechanism at this level as response facilitation without understanding meaning. Experimental results using monkeys indicated that the same
Mirrori ng Action Generation and Reco gnition
183
motor neurons in the rostral part of t he inferior parietal cortex are activated when a monkey generates and when he observes meaningless arm movements. Also, as mentioned in section 4.2, it was observed that the same F5 neurons in monkeys fire when purposeful motor actions such as grasping an object, holding it, and bringing it to the mouth are either generated or observed. The neural mechanism at this level is called response facilitation with understanding meaning (Rizzolatti et al., 2001), which is considered to correspond to the third stage of the “like me” mechanism hypothesized by Meltzoff (2005). In this stage “my” mental state can be projected to those of others who act “like me.” I consider that our proposed mechanism for inferring the PB states in RNNPB can account for the “like me” mechanism at this level. Let’s look here at the results of a robotics experiment that my team conducted to elucidate how the recognition of other’s actional intention can be mirrored in one’s own generation of the same action, wherein the focus falls again on the online error regression mechanism used in the RNNPB model (Ito & Tani, 2004; Ogata et al., 2009). 8.3.1 Model and Robot Experiment Setup This experiment on imitative interactions between robots and humans was conducted by using Sony humanoid robot QRIO (Figure 8.3). In the learning phase of this experiment, the robot learns multiple hand movement patterns demonstrated by the experimenter. The RNNPB learns to predict how the positions of the experimenter’s hands (pe rceived as a visual image) change in time in ter ms of dynamic mapping from v t to v t+1. Simultaneously, the network also learns, in an imitative manner, to predict how its own arms (4DOF joints for each arm) move as corresponding to the observed movements performed by the experimenter. This prediction takes the form of dynamic mapping of arm proprioception from pt to pt+1 through direct training performed by a teacher who guides the movements of the robot’s arms by moving them directly while following the experimenter’s hand movements. The tutoring is conducted for each movement pattern by determining its corresponding PB vector for encoding. In the i nteraction phase, when one of the learned movement patterns is demonstrated by the experimenter, the robot is expected to recognize it by
184
Emergent Minds: Findings from Robotics Experi ments
Figure 8.3. Sony humanoid robot QRIO employed in the imitation learning experiment. Reprodu ced from Tani et al. (200 4) with permiss ion.
inferring an optimal PB vector for reconstruction of the movement pattern through which its own corresponding movement pattern may be genera ted. When the exper imenter switches his/ her demonstration of hand movement patter ns from one to another f reely, the movement patter ns generated by the robot sh ould change accordingly by inferri ng the optimal PB vector. 8.3.2 Results: Reading Others’ Mental States by Segmenting Perceptual Flow In the current experi ment, af ter the robot was trained on four different movement patterns, it was tested in terms of its dynamic adaptation to sudden changes in the patterns demonstrated by the experimenter. Figure 8.4 shows one of the obtained results in which the experimenter switched demonstrated movement patterns twice during a trial of 160 steps. It can be seen that when the movement pattern demonstrated by the experimenter was shifted from one of the learned patterns to another,
d n a H n a m u H l a u t c A
0.8
LYH LZH RYH RZH
0.2
L: left R: right Y: Y-axis Z: Z-axis H: hand
.0 n ito 0.6 is o 0.4 P
0.0 20
n a m u H d te c i d e r P
n e e G
60
80 00 Step
20
40
LYH LZH RYH RZH
.0 n 0.8 io ist 0.6 o P 0.4 d n a 0.2 H
L: left R: right Y: Y-axis Z: Z-axis H: hand
0.0 20
m r A t o b o R d te ra
40
40
60
80 00 Step
20
40
LSHP LSHR LSHY RSHP RSHR RSHY
) .0 le g n 0.8 A t 0.6 n i Jo 0.4 (
0.2 0.0 20
40
60
80
00
20
40
SH: shoulder P: pitch R: roll Y: yaw
Step
.0
PBN PBN2 PBN3 PBN4
0.8 B P
0.6 0.4 0.2 0.0 20
40
60
80
00
20
40
Step
Figure 8.4.triggered Dynamic generated by the robot bychanges changes in inthe the movement movementspatterns demonstrated by the experimenter. The time evolution profile of the perceived position of the experimenter’s hand and the profile predicted by the robot are shown in the first and the second rows, respectively. The third and fourth rows show the time profiles for the predicted proprioception (joint angles) of the robot’s arm and the PB vectors, respectively. Adopted from Tani et al. (2004) with permission.
186
Emergent Minds: Findings from Robotics Experi ments
the visual and proprioceptive prediction patterns were also changed correspondingly, accompanied by stepwise changes in the PB vector. Here, it can be seen that the continuous perceptual flow was segmented into chunks of dif ferent learned patterns v ia sudden changes in t he PB vector mechanized by bottom- up error regression. This means that RNNPB was able to read the transition of mental states of the experimenter by segmenting the flow. There was an interesting finding that connects the ideas of compositionality and segmentation. When the same robot was trained for a long sequence that consisted of periodic switching bet ween two dif ferent movement patterns, the whole sequence was encoded by a single PB vector without segmentation. This happened because perception of every step in the trained sequence was perfectively predictable , including the moment of switching between the movement patterns due to the exact periodicity in the tutored sequence. When everything becomes predictable, all moments of perception belong to a single chunk without segmentation. The compositionality entails potential unpredictability because there is always some arbitrariness, perhaps by “free will,” in combining a set of primitives into the whole. Therefore, segmentation of the whole compositional sequence into primitives can be performed by using the resultant prediction error. In this situation, what is read from the experimenter’s mind might be his or her “free will” for alternating among primitive patterns. The aforementioned results accord with the phenomenology of time perception. Husserl assumed that the subjective experience of “nowness” is extended to include the fringes in the sense of both the experienced past and the future, in terms of retention and protention, as described in section 3.3. This description of retention and protention in the preempirical level seems to correspond directly to the forward dynamic undertaken by RNNs (Tani, 2004). RNNs perform prediction by retaining the past flow in a context- dependent way. This selforganized contextual flow of the forward dynamics in RNNs could be responsible for the phenomenon of retention. Even if Husserl’s notion of nowness in terms of retention and protention is understood as corresponding to contextual dynamics in RN Ns, the following question stil l remains: What are the boundaries of nowness? The idea of segmentation c ould be the key to answering t his question. Our main idea is t hat nowness is bounded where the flow of experience
Mirrori ng Action Generation and Reco gnition
187
is segmented (Tani, 2004). In the RNNPB model, when the external perceptual flow cannot be matched with the inter nal flow corresponding to the anticipated outcome, the resultant error drives PB vector change. When the prediction is not fulfi lled, the flow is segmented into chunks, which are no longer just parts of the flow but rather represent events that are identified as one of the perceptual categories by the PB vector. This identification process takes a certain period of effort accompanied by “consciousness” because of delays in the convergence of the PB regression dynamics, as observed in the preceding experiments. This might also explain the aforementioned observation by Varela (1999) that the flow o f events in t he immediate past is e xperienced just as an impression, which later becomes a consciously retrieved object after undergoing segmentation. Finally, I claim that projection of“my” mental state to those of others who act “like me” assumed in the third stage of Meltzoff’s (2005) “like me” mechanism should accompany such conscious process. 8.3.3 Mutual Imitation Game The previous experiment involved unidirectional interaction in which only the robot adapted to movements demonstrated by the experimenter. Our next experiment examined the case of mutual interaction by introducing a simple game played by the robot and human subjects. In this new experiment, the robot was trained for four movement patterns by the experimenters and then human subjects who were unaware of what the robo t had learned part icipated. In the imit ation game, the subjects were instructed to identify as many movement patterns as possible and to synchronize their movements with those of the robot through interactions. Five subjects participated in the experiment and each subject was allowed to interact with the robot for 1 hour. Although most of the subjects eventually identified all of the movement patterns, the interaction was not trivial for them. If they merely attempted to follow the robot’s movement patterns, convergence could not be achieved in most instances because the PB values fluctuated wildly when unpredictable hand movement patterns were demonstrated. Actually, the robot tended to generate diverse movement patterns due to fluctuations in the PB. Also, if the subjects attempted to
188
Emergent Minds: Findings from Robotics Exper iments
execute their desired movement patterns regardless of the robot’s movements, the robot could not follow them unless the movement patterns of the subjects corresponded to those already learned by the robot. The movement patterns of the human and the robot as well as the neural activity (PB units) obtained during interaction in the imitation game are plotted in Figure 8.5 in the same format as in Figure 8.4. We can see that diverse movement patterns are generated by the robot and the human subject, accompanied by frequent shifts during their interactions. It can be seen that ma tching by synch ronizat ion between the human subject’s movements and the robot’s predictions is achieved after an exploratory phase (see the sections denoted as “ Pattern 1” and “Pattern 2” in the figure). However, it was often observed that such matching was likely to break down before a match was achieved for another pattern. An interesting observation involves the spontaneous switching of initiative between the robot and the subj ects. I n postexperiment interviews, the subjects reported that when they felt that the robot movement pattern became close to theirs, they just kept following the movements passively to stabilize the pattern. However, when they felt that their movements and those performed by the robot could not synchronize, they often initiated new movement patterns, hoping that the robot would start to follow them and eventually synchronize its movements with those of the subject. This observation is analogous to the turn ta king during imitative exchange observed by Nadel (2002) as described in section 5.2. Another interesting observation was that spontaneous transitions between the synchron ized phase and the desynchronized phase tended to occur more frequently in the middle of each session, when the subject was already familiar with the robot’s responses to some degree. When the subjects managed to reach a synchronized movement pattern, they tended to keep the attained synchronization for a short period of time to memorize t he pattern. However, this synchroni zation could break down after a while due to various uncertainties in mutual interactions. Even small perturbations could confuse the subjects if they were not yet f ully confident of t he robot’s reper toire of movement patterns. T his too can be explained by the m echanism of self- organized criticality (see section 7.2), which can emerge only during a specific period characterized by an adequate balance between predictability
Pattern2
Pattern d n a H n n a o i m ti u s o HP l a u t c A
d n a H n a n mo u iti Hs o d P e t c i d re P
.00 0.80 0.60 0.40 0.20 0.00 20
40
60
80
00
20
40
60
80
200
20
40
60
80
00
20
40
60
80
200
20
40
60
80
00
20
40
60
80
200
.00 0.80 0.60 0.40 0.20 0.00 .00 0.80
B P
0.60 0.40 0.20 0.00
Step
Figure 8.5. A snapshot of parameter values obtained during the imitation game. Movement matching by synchronization between human subject and the robot took place momentarily, as can be seen from the sections denoted as Pattern 1 and Pattern 2 in the p
190
Emergent Minds: Findings from Robotics Exper iments
and unpredictability in the course of the subjects’ developmental learning i n the mutual imitat ion game. Turn taking was observed more frequently during this period. These results imply that vivid communicative exchanges between individuals can appear by utilizing and anticipating such criticalit y. The current experi mental results of the im itation game suggest that imitation provides not only a simple function of storing and regenerating observed patterns, but also provides for rich functions of spontaneously generating novel patterns from learned ones through dynamic interactions with others. In this context, we may say that imitation for human beings is a means for developing diverse creative images and actions through communicative interaction, rather than simply for mimicking action patterns as demonstrated by others “like me.” The next subsection explores how mirror neurons may function in developing actional concepts through the association of language with action learning.
8.4. Binding Language and Action In conventional neuroscience, language processing and action processing have been treated as independent areas of research simply because of the dif ferent areas of expertise necessary for conducting studies in each of those areas. However, as mentioned in section 4.2, recent reports have shown that understanding words or sentences related to actions may require the presence of specific motor circuits responsible for generating those actions, and therefore the parts of the brain responsible for language and actions might be interdependent (Hauk et al., 2004; Tettamanti e t al., 200 5). According to C homskian ideas in conven tional lingui stics, lingu istic competence has been regarded as independent from other competencies, including sensory– motor processing (see the argument on the faculty of language in narrow sense by Hauser, Chomsky, and Fitch [2002] in section 2.1). This view, however, is now being challenged by recent evidence from neuroscience, including the aforementioned studies examining the interdependence between linguistic and other modalities. If everyday experiences involving speech and its corresponding sensory-motor signals tend to overlap during child development, synaptic connections between the two circuits can be reinforced by Hebbian
Mirror ing Action Generation and Recognitio n
191
learning, as discussed by Pulvermuller (2005). This suggests the possibility that the meanings of words and sentences as well as associated abstract concepts can be acquired in association with related sensory– motor experiences. Researchers working in the area of cognitive linguistics have proposed the so- called usage-based approach (Tomasello, 2009), wherein it is arg ued that linguis tic competency can be acquired through statistical learning of linguistic and senso ry– motor stimuli during child development, without the need to assume innate mechanisms such as Chomsky’s universal grammar. Analogous to these ideas is the view of Arbib (2012) discussed earlier, that the evolution from dexterous manual behaviors learned by imitation to the anticipated imitation of conventionalized gestures (protolanguage) is reflected in the evolution within the primate line and resulted in humans endowed with “language- ready” brains. 8.4.1 Model In this context, we consider the possibly interdependent nature of language and motor action in terms of a mirror neuron model. This concept is based on a predictive coding model for linguistic competence assumed in the extension of Wernicke’s area to Broca’s area and another predictive coding model for the action competency assumed in the extension from Broca’s area and the parietal cortex to the motor cortex. Broca’s area, as a hub connecting these two distinct pathways, is assumed to play the role of unifying the two different modalities by means of mirroring recognition in one modality and generation in the other modality by sharing the intention. The version of the RNNPB model proposed by Yuuya Sugita and me (Sugita & Tani, 2005) for investigating the task of recognizing a given set of action-related imperative sentences (word sequences) and of also generating t he corresponding behaviors ( sensory- motor sequences) is shown in Figure 8.6. The model consists of a linguis tic RNN PB and a behavioral RN NPB that are interconnected through PB units. The key idea of the model is that the PB activation vectors in both modules are bound to become identical for generating pairs of corresponding linguistic and behavioral sequences via learning. More specifically, in the course of associative learning of pairs of linguistic and behavioral sequences, the PB activation vectors in both modules are upda ted in the d irection of minimiz ing
192
Emergent Minds: Findings from Robotics Exper iments (a) teaching word sequence target: wT+
teaching sensory motor target: mt+, st+ Error mt+ st+
Error wT+
PBI wT
mt s t
ct PB Shared
linguisticmodule
PBb
ct
behaviormodule
Learning phase (b) given word sequence wT+
sensory motor generation
Error wT+
mt+ st+
PBI
PBb ct
wT
mt st
ct
PB linguisticmodule
Transfer
behaviormodule
Recognition and generation phase
Figure 8.6. RNN PB model extended for language- behavior bound learning. (a) Bound learning of word sequences and corresponding sensory-motor sequences through shared PB activation and (b) recognition of word sequences in the li nguistic recur rent neural network with parametric biases (RNNPB) and generati on of corresponding sensory– motor sequences in the behavioral RN NPB. Redrawn from Tani et al. (200 4).
their differences as well as minimizing the prediction error in both modalities (Figure 8.6 a). By using the error signal back-propagated from both modules to the shared PB units, a sort of unified representation between the two modalities is formed through self- organization in the PB activations. Af ter convergence of the bound learning, word sequences shown to the linguistic RNNPB can be recognized by inferring the PB
Mirrori ng Action Generation and Reco gnition
193
activation values by means of error regression. Thereafter, the forward dynamics of the behavi oral RN NPB activated with the obtained PB activation values generate a prediction of the corresponding sensory-motor sequences (Figure 8.6 b). 8.4.2 Robot Experiments Yuuya Sugita and I (Sugita & Tani, 2005) conducted robotics experiments on this model by utiliz ing a quasilanguage with the aim of gaining insights into how humans acquire compositional knowledge about action-related concepts through close interactions between linguistic inputs and related sensory– motor experiences. We also addressed the issue of generalization in the process of learning linguistic concepts, which concerns the inference of the meanings of as yet unknown combinations of word sequences through a generalization capability related to the “poverty of stimulus” problem (Chomsky, 1980) in human language development. A physical mobile robot equipped with vision and a one-DOF arm was placed in a workspace in which red, blue, and green objects were always located to the left, in front, and to the right of the robot, respectively (Figure 8.7).
(a)
red, blue and green objects
(b) blue red
green
mobile robot with vision and -D hand at home position
Figure 8.7. Robot experiment setup for language-behavior bound learning. (a) The task environment with the mobile robot in the home position and three objects in front of the robot. (b) A trained behavior trajectory of the command “hit red.” Adopted from T ani et al. (2004) w ith permiss ion.
194
Emergent Minds: Findings from Robotics Ex periments
A set of sentences consisting of three verbs (point, push, hit), six nouns (left, center, right, red, blue, green) were considered. For example, “push red” means that the robot is to move to the red object and push it with its body, and “hit left” means that the robot is to move to the object to its left and hit it with its arm (Figure 8.7 b). Note that “red” and “lef t” are synonymous in the set ting of this workspace, as are “blue” and “center” and as are “green” and “right.” For given combinations of verbs and nouns, corresponding actions in terms of sensory- motor sequences composed of more than 100 steps are trained by guiding the robot while introducing slight variations in the positions of the three objects with each tr ial. The sensory- motor sequences consist of sensory inputs in the form of several visual feature vectors, values for motor torques of the arm and wheel motors, and motor outputs for the two wheels and the one-DOF arm. To investigate the generalization capabilities of the robot, especially in t he case of linguis tic train ing, only 14 out of the 18 possible sentences were trained. This means that behavioral categories corresponding to the four untrained sentences were learned without being bound with sentences. 8.4.3 Compositionality and Generalization Recognition and generation tests were conducted after convergence in learning was attained by minimizing the error. Corresponding behaviors were successfully generated for all 18 sentences, including the four untrained ones. To examine the internal structures emerging as a result of self- organization in the bound learning process, an analysis of the PB mapping was conducted by taking two-dimensional principal components in the srcinal six- dimensional PB space. Figure 8.8 shows the PB vector points corresponding to all 18 sentences as plotted in a two-dimensional space. These PB points were obtained as a result of the recognition of corresponding sentences. The PB vector points for t he four untrained word sequences are surrounded by dashed circles in the figure. First, it can be seen that PB points corre sponding to sentences with t he same verbs followed by synonymous nouns appeared close to each other on the t wodimensional map. For example, “hit left” and “hit red” appeared close to each other in the space. Even more interesting is that the PB mappings for all 18 sentences appeared in the form of a two- dimensional grid structure with one dimension for verbs and another for nouns.
Mirrori ng Action Generation and Reco gnition
195
This means that the PB mapping emerged through self- organization of an adequate metric space, which can be used for compositional representation of acquired meanings in terms of combinations of verbs and object nouns. Furt hermore, it should be n oted that even the untrai ned sentences (“push red/left” and “point green/right”) were mapped to appropriate points on the grid (see the points surrounded by dotted circles in Figure 8.8). This explains why untrained sentences were recognized correctly, as inferred from the successful generation of corresponding behaviors. These results imply that meanings are acquired through generalization when a set of meanings is represented as a distribution of neural activity while preservi ng the mutual relationshi ps between meanings in a binding metric space. Such generalization cannot be expected to arise if each meaning or concept is stored in a separate local module as is the case in localist models. It is postulated that mutual interactions between
0.8
) C P d n 2 (
re d
B P
n o
u n
int po
bl ue gr ee n
hit
h p us
r ve
b
0.2 0.2
0.8 PB (st PC)
Point red Point left Point blue Point center Point green Point right
Push red Push left Push blue Push center Push green Push right
Hit red Hit left Hit blue Hit center Hit green Hit right
Figure 8.8. Mapping from PB vector points to generated word sequences. The two-dimensional grid structu re consists of an axi s for verbs and another for nouns. Four PB points surrounded by dotted circles correspond to untrained sentences (push red, push left, point green, and point right.) Redrawn from Sugita and Tani (2005).
196
Emergent Minds: Findings from Robotics Exper iments
different concepts during learning processes can eventually induce the consolidation of generalized str uctures in the memory structu re as represented earlier in the form of a two- dimensional distribution. This idea is analogous to what the PDP group (1986) argued in their connectionist book more than two decades ago (see section 5.4). Finally, I would like to add one more remark concerning the role of language in developing compositional conceptual space. When the aforementioned experiments were conducted without binding t he lingui stic inputs in learning the same set of action categories, we found that nine different clusters corresponding to different actional categories were developed without showing any structural relations among them, such as is illustrated by the aforementioned two- dimensional grid structure in the PB space. This result suggests that compositionality explicitly perceived in the linguistic input channel can enhance the development of compositionality in the actional channel via shared neural activity, perhaps, again, within the Broca’s area of the human brain.
8.5. Summary We’ve now covered RNNPB models that can learn multiple behavioral schemes in the form of structures represented as distributions in a single RNN. The model is characterized by the PB vector, which plays an essential role in modeling mirror neural f unctions in both the generation and recognition of movement patterns by forming adequate dynamic structures internally through self- organization. The model was evaluated through a set of robotics experiments involving the learning of multiple movement patterns, the imitation learning of others’ movement patterns, and generating actional concepts via associative learning of proto-language and behavior. The hallmark of these robotics experiments exists in their attempt to explain how generalization in learning as well as creativity for generating diversity in behavioral patterns can be achieved through selforganizing distributed memory structures. The contrast between the proposed distributed representation scheme and the localist scheme in this context is clear. On the localist scheme, each behavioral schema is memorized as an independent template in a corresponding local module, whereas on the distributed representation scheme, learning
Mirrori ng Action Generation and Reco gnition
197
is considered to include not just memorizing each template of behavioral patterns but also reconstructing them by extracting t he structural relationships between the templates. If there are tractable relationships between learned patterns in a set, these relationships should appear in the corresponding mem ory st ructures as embedded in a particular metric space. Such characteristics of distributed representation in RNNPB model has been investigated by others (Ogata et al., 2006; Ogata et al., 2009; Zhong, et al., 2014) as well. The aforementioned characteristics were demonstrated clearly in the analysis of the PB mapping obtained in the results of learning a set of movement patterns and of learning bound linguistic and behavioral patterns. The RNNPB model learned a set of experienced patterns not just as they were, but also deeply consolidated them, resulting in the emergence of novel or “creative” images. This observation might account for a fascinating mechanism of human cognition by way of which we humans can develop images or knowledge through multiple stages from our own limited experiences: In the first st age, each instance of experience is acquired; in the second stage, generalized images or concepts are developed by extracting relational structures among the acquired instances; in the third stage, even novel or creative ones can be found in the memory developed with the relational structures after long period of consolidation. Another interesting characteristic feature of the model is that it accounts for both top-down generation and bottom-up recognition processes by utilizing the same acquired generative model. Interactions between these two processes t ake place in of fline learning processes as well as during real- time action generation/recognition. In offline learning, iterations of top- down and bottom-up interactions enable longterm st ructural developments of the internal st ructures for PB mapping in terms of memory consolidation, as mentioned previously. In real-time action generation/recognition, shifts of t he PB vector by means of error regression enable rapid adaptation to situational changes. As observed in the imitative game experiments, nontrivial dynamics emerge in the close interactions between top- down prediction and bottom-up recognition, leading to segmentation of the continuous perceptual flow into meaningful chunks. Complexity arises f rom the intrinsic characteristics of mutual interactions occurring in the process, whereby recognition of the actions of others in the immediate past has a profound effect on the actions generated by the robot in the current step, which in turn
198
Emergent Minds: Findings from Robotics Exper iments
affects the recognition of these perceptual inputs in the immediate future, thereby forming a circular causality over the continuum of time between protention and retention. The same error regression mechanism can give an account for the problem of imitation. How can motor acts demonstrated by others be imitated by reading their intentions or mental states? It was shown that imitating others by inferring their mental states can be achieved by segmenting the resultant perceptual flow by regressing the PB states with its prediction error. This prediction error may result i n the subject becoming conscious while recognizing t he shift of mental states of others as t hey alternate their motor a cts. Finally, I assume t here might be some concerns about the scalability of the RNNPB model, or more specifically whether there are any limits to the degree of complexity that the learned behavioral patterns can have. Here, I just mention that this scalability i ssue depends heavily on how functional hierarchies can be developed that can decompose complex patterns into sets of simpler ones, or compose them vice versa, in the network. Accordingly, the next two and the final chapter of this book are entirely dedicated to the investigation of this problem.
9 Development of Functional Hierarchy for Action
It is generally held that the brain makes use of hierarchical organization for both recognizing sensory inputs and generating motor outputs. As an example, chapter 4 illustrated how visual recognition proceeds in the brain from early signal processing in the primary vision area to object recognition in the inferior temporal area. It also described how action generation proceeds from the sequencing and planning of action primitives in the supplementary motor area and prefrontal cortex (PFC) to motor pattern generation in the primary motor cortex (M1). Although we don’t yet completely understand what hierarchy and what levels exist in the brain and how they actually function, it is generally accepted that some form of functional hierarchy exists, whereby sensory– motor processing is conducted at the lower level and more global controls of those processes occur at the higher level. Also, this functional hierarchy is thought to be indispensable for expressing the essential human cognitive competency of compositionality, in other words, composition and decomposition of whole complex action routines from and into reusable part s.
199
200
Emergent Minds: Findings from Robotics Exper iments
In speculating about possible neuronal mechanisms for a functional hierarchy that allows for complex actions to be composed by sequentially combining behavior primitives (a set of commonly used behavior patterns), readers should note that there are various ways to achieve such compositions. One possibility is to use a localist representation scheme. For example, Tani and Nolfi (1997, 1999) proposed a localist model, called a “hierarchical mixture” of RNNs, Demiris and Hayes (2002) showed a similar idea in t he proposal of Hierarchical Attentive Multiple Models for Execution and Recognition, and also Haruno and colleagues (2003) did in the proposal of the so- called hierarchical MOSAIC. The basic idea was that each behavior primitive is stored in its own independent local RNN at the lower level, and sequential switching of the primitives is achieved by a winner- take-all-type gateopening control of these RNNs performed by the higher level RNN (see Figure 9.1). Information processing at the higher level is abstracted in such a way that the higher level only remembers which RNN in the lower level should be selected next as well as the timing of switching over a longer timescale, without concerning itself with details about the
g inn pe Gate-2 o tea G n io ct dei Pattern2 Pr ualt pe cr Pe
l o trn o c g inn ep o tea g pye -t A T W
Higher Gate-
Gate-3
T Time Steps
Pattern
Pattern3
Time Steps
Lower
t
pt+ Gate-
pt
ct
GateT+
Gate
T
p
cT p
t+
Gate-2
pt
t+
Gate-3
ct
pt
ct
Figure 9.1. Hierarchical generation of perceptual sequence patterns in the hierarchical mixtu re of RNNs. As the higher level RNN di spatches the lower level RNNs sequentially by manipulating the openings of their attached gates, sequential combinations of primitive patterns can be generated.
Development of Functional Hierarchy for Action 201
sensory– motor profiles themselves. Although the proposed scheme seems to be st raightforward in terms of mechanizing a funct ional hierarchy, the scheme is faced with the problem of miscategorization in dealing with perturbed patterns. Rather, the discrete mechanism of dispatching behavi or primitives th rough the winner- take-all-type selection of the lower RNNs tends to generate a certain level of information mismatch between the higher and lower levels. Another possible mechanism can be considered by utilizing a distributed representation scheme in an extension of the RNNPB model. As I previously proposed (Tani, 2003), if a specific PB vector value is assigned to each acquired behavior primitive, sequential changes in the PB vector generated at the higher level by another RNN can cause corresponding sequential changes in the primitives at the lower level (Figure 9.2). The higher level RNN learns to predict event sequences in terms of stepwise changes in the PB vector, as well as the timings of such events. Howev er, this scheme could also suf fer from a sim ilar problem
PBT+ Higher PB2 B P
PB PBT Time Steps
cT
T pt+
n o i t ic d e r P
Lower Pattern
Pattern2
Pattern3
l a u t p e c r e P
PBt pt Time Steps
ct
t
Figure 9.2. Possible extension of the RNNPB model with hierarchy, wherein sequential stepwise changes in the PB vector at the higher level generate corresponding changes in the primitive patterns at the lower level. Redrawn from Tani (2003).
202
Emergent Minds: Findings from Robotics Exper iments
of information mismatch between the two levels. If one behavior primitive is concatenated to another by corresponding stepwise changes in the PB vector, a smooth connection between the two primitives cannot be guaranteed. A smooth connection often requires some degree of specific adaptation of profiles at the tai l of the preceding primitive and at the head of the subsequent primitive, depending on t heir combination. How ever, such fi ne adaptation cannot take place by simply changing the components of the PB vectors in a s tepwise manner within the time necessary for the primitive to change. The same problem is encountered in the case of gated local network models if primitives are changed by simply opening and closing the corresponding gates. The crucia l point here is that the generation of com positional act ions cannot be achieved by simply transforming primitives into sequences in the same manner as manipulating discrete objects. Instead, the task requires fluid transitions between primitives by adapting them via interactions between top- down parametric control exerted on the primitives and bottom- up modulation of signals implementing such parametric control. Close interactions could minimize the possible mismatch between the two sides, whereby we might witness what Alexa nder Luria (1973) metaphorically refer red to as “ki netic melody” in the fluid generation of actions. The following sections show that such fluid compositionality can be achieved without using preexisting mechanisms such as gating and parametric biases. Rather, it can emerge by using intrinsic constrai nts on timescale differences in neural activity between multiple levels in the course of self- organization, accompanied by iterative interactions between the levels in consolidation learning. In the following, we see how the fu nctional hierarchy that enables compositional action generation can be developed through the us e of a novel RNN model characterized by its multiple timescales dynamics. The model is tested in a task involving learning object manipulation and developing this learning. We then discuss a possible analogy between the synthetic developmental processes observed and real human infant developmental processes. The discussion helps to explain that how fluid compositionality can be developed in both human and artifact through specific constraints within their brain networks.
Development of Functional Hierarchy for Action 203
9.1. Self-Organization of Functional Hierarchy in Multiple Timescales 9.1.1 Multiple-Timescale Recurrent Neural Net work My colleague Yuuichi Yamashita and I (Yamashita & Tani, 2008) proposed a dynamic neural network model characterized by the dynamics of its neural activity in multiple timescales. This model, named the multipletimescale recurrent neural network (MTRNN), is outlined in Figure 9.3. The MTRNN consists of interconnected subnetworks to which dynamics with dif ferent timescales are assigned. Each subnetwor k takes the form of a fully connected continuous-time recurrent neural network (CTRNN) with a specific time constant τ assigned for the purposes of neural activation dynamics, as can be seen in Eq. 17 in section 5.5. The model shown in Figure 9.3 is composed of subnetworks with slow,
Intention state Update
Top-down Generation
Set
Slow Action Plans Slow n o sis re g e r r o rr e p -u m to t o B
Init A Init B
n ito c i d e r p n w o d p o T
Intermediate Intermediate
Pool for Primitives
Fast Fast Vision module
Proprio module
Compositional Generations
Action A
Pt
Pt+ Error Pt+
Vt
motort+
Vt+
Action B
Error Vt+
Figure 9.3. The multiple-timescale recurrent neural network (MTRNN) model. The left panel shows the model architecture and the right panel the information flow in the case of top- down generation of different compositional actions, Action A and Action B as triggered by the corresponding inten tion in terms of in itial states of Init A and Init B in t he intention units, respectively.
204
Emergent Minds: Findings from Robotics Experi ments
intermediate, and fast dynamics characterized by the leaky- integrator neural units with larger, medium, and smaller values of τ, respectively. Additionally, the subnetwork with fast dynamics is subdivided into two peripheral modular subnetworks for proprioception/motor operations and for vision. Our expectation in the proposed multiple timescales architecture was that the slow dynamic subnet using large time constant leaky-integrator units should be good at learning long-time correlation, as indicated by Jaeger and colleagues (2007), whereas the fast dynamics one should be good at learning precise short-ranged patterns. We designed this particular model to generate targets of multiple perceptual sequences that contain a set of primitives or chunks acquired as a result of supervised learning. In t his case, we made use of the sensitivity of t he dynamics toward initial conditions seen in nonlinear dynamics (see section 5.1) as a mechanism for selecting a specific sequence from among multiple learned ones as intended one. The network dynamic always starts with the same neutral neural states for all units, with the exception of some neural units in the subnetwork with slow dynamics referred to as intention units. By providing specific initial states for these intention units, corresponding learned perceptual sequences of intended can be regenerated, and thus the initial states of the intention units play the role of selecting sequences, similar to the role of PB vectors in RN NPB models. The difference is that selection in the case of PB is based on parametric bifu rcation, while in the case of intention units in MT RNNs this is performed by utilizing the sensitivity of the networ k dynamics to the initial conditions. We decided to employ a switching scheme based on sensitivity to the initial conditions for the MTRNN because this feature affords learning of sequence patterns with a long time correlation. Adequate mappings between the respective initial s tates of the i ntention units and the corresponding perceptual sequences are acquired by means of the error back-propagation through time learning scheme applied for CTRNN (Eq. 18 in section 5.5). In the course of error backpropagation learning, two classes of variables are determined, namely the connection weights in all subnetworks and a specific set of initial state values for the intention units for each perceptual sequence to be learned. When learning i s commenced, the initial state of the i ntention units for each training sequence is set to a small random value. The forward top-down dynamics initiated with this temporarily set initial state generates a predictive sequence for the training visuo- proprioceptive
Development of Functional Hierarchy for Action 205
sequence. The error generated between the training sequence and the output sequence is back-propagated along the bottom-up path through the subnetworks with fast and intermediate dynamics to the subnetwork with slow dynamics, and this back-propagation is iterated backward through time steps via recurrent connections, whereby the connection weights within and between these subnetworks are modified in the direction of minimizing the error signal. The error signal is also backpropagated through time steps to the initial s tate of the intention units, in which the init ial state values for each traini ng sequence are modified. Here, we see again that learning proceeds through dense interactions between top- down regeneration of the training sequences and bottomup regression of the regenerated sequences utilizing error signals, just as the RNNPB does. One point to keep in mind here is that the dampening of the error signal in backward propagation though time steps depends on the time constant as described previously (see Eq. 18 in section 5.5). It becomes smaller within the subnetwork with slow dynamics (characterized by a larger time constant) and greater within the subnetwork with fast dynamics (characterized by a smaller time constant). This forces the learning process to extract the underlying correlations spanning longer periods of time in the training sequences in the parts with slower dynamics and correlations spanning relativel y shorter periods of time in the parts with faster dy namics in the whole network. The right panel of Figure 9.3 illustrates how learning multiple perceptual sequences consisting of a set of primitives results in the development of the corresponding functional hierarchy. First, it is assumed that a set of primitive patterns or chunks should be acquired in the subnetworks with fast and intermediate dynamics through distributed representation. Next, a set of trajectories corresponding to slower neural activation dynamics should appear in the subnetwork with slow dynamics in accordance with the initial state. This subnetwork, of which activity is sensitive to the initial conditions, induces specific sequences of primitive transitions by interacting reciprocally with the intermediate dynamics subnetwork. In the slow dynamics subnetwork, action plans are selected according to intention and are passed down to the intermediate dynamics subnetwork for fluid composition of assembled primitives in the fast dynamics subnetwork. It is noted that change in the slow dynamic activity plays a role of parameter bifurcation for the intermediate and fast dynamics to generate trans itions of primitives.
206
Emergent Minds: Findings from Robotics Exper iments
As another function, MTRNNs can generate motor imagery by feeding predicted visuo- proprioceptive states into f uture inputs, analogous to the closed- loop forward dynamics of the RNNPB. Diverse motor imagery can be generated by manipulating the in itial st ate of the intention units. By this means, our robots with MTRNN can become self-narrative about own possibility, as described later. Additionally, MTRNNs can perform both offline and online recognition of perceptual sequences by means of error regression, as in the case of the RNNPB model. For example, prediction errors caused by unexpected visual sensory input due to certain changes in the environment are back-propagated from the visual module of the fast dynamics subnetwork through the one with intermediate dynamics to the intention units in the slow dynamics subnetwork, whereby the modulation of the activity of the intention units in the direction of minimizing the errors results in the adaptation of the currently intended action to match the changed environment. These f unctions have been evaluated in a set of robotics experimen ts uti lizing t his model, as described later in this chapter. 9.1.2 Correspondence with Neuroscience Now, let’s revisit our previous discussions and examine briefly the correspondence of the proposed MT RNN model to concepts in system- level neuroscience. Because the neuronal mechanisms for action generation and recognition are still puzzling due to clear conflicts between different experimental results, as discussed i n chapter 4, the correspon dence between the MTRNN model and parts of the biological brain can be investigated only in terms of plausibility at best. First, as shown by Tanji and Shima (1994), there is a timescale difference in the buildup of neural activation dynamics between the supplementary motor area (with slower dynamics spanning timescales of the order of seconds) and M1 (with faster dynamics of the order of a fraction of a second) immediately before action generation (see Figure 4.5), and therefore our assumption that the organization of a functional hierarchy involves timescale differences between regional neural activation dynamics should make sense in modeling the biological brain. Considering this, Kiebel and colleagues (2008), Badre and D’Esposito (2009), and Uddén and Bahlmann (2012) proposed a similar idea to explain the rostral– caudal gradient of timescale differences by assumi ng slower dynamics at t he rostral side ( PFC)
Development of Functional Hierarchy for Action 207
and faster dynamics at the caudal side (M1) in the frontal cortex to account for a possible functional hierarchy in the region. Accordingly, the MTRNN model assumes that the subnetwork with slow dynamics corresponds to the PFC and/or the supplementary motor area, and that the modular subnetwork with fast dynamics corresponds to the early visual cortex in one stream and to the premotor cortex or M1 in another stream (Figure 9.4). The subnetwork with moderate dynamics may correspond to the parietal cortex, which can interact with both the frontal part and the peripheral part. One possible scenario for the top-down pathway is that the PFC sets the initial state of activations with slow dynamics assumed in the supplementary motor cortex, which subsequently propagates to the parietal cortex assumed to exhibit moderate- timescale dynamics. Activations in the parietal cortex propagate further into peripheral cortices (the early visual cortex and the premotor or primary motor cortex), whereby detailed predictions of visual sensory input and proprioception are made, respectively, by means of neural activations with fast dynamics. On the other hand, prediction errors generated in those peripheral areas are propagated backward to the forebrain areas through the parietal cortex via bottom- up error regression in both learning and recognition, assuming of course that the aforementioned retrograde axonal signaling mechanism of brains implements the error
Error PFC/SMA (Slow) Intention
Motor (Fast)
Parietal (Medium)
Error Vision (Fast)
Figure 9.4. Possible correspondence of MTRNN to parts of the biological brain. The solid line represents the top- down prediction pathway (from PFC/SMA, Parietal to Motor and Vision), and the dotted line represents the bottom-up error regression pathway (from Vision, Parietal to PFC/SMA).
208
Emergent Minds: Findings from Robotics Exper iments
back-propagation scheme (see section 5.5.) . In this situation, the par ietal corte x wedged between the frontal and peripheral parts plays the role of an information hub that integrates multiple input modalities and motor outputs with the current intention for action. It has been speculated that populations of bimodal neurons in the parietal corte x, which have been shown to encode multiple modalities of information processing, such as vision and motor outputs (Sakata et al., 1995) or vision and somatosensory inputs (Hyvarinen & Poranen, 1974), are the consequence of synaptic modulation accompanied by top- down prediction and bottom- up error regression in t he iterative learning of behavioral skills. It is worth pausing here a moment to think about what the initial states actually mean in the brain. Because the initial states unfold into sequences of behavior primitives, which are expanded into target proprioceptive sequences and finally into motor command sequences, it can be said that motor programs can be represented by the initial states of particular neural dynamics in the brain. Coincidentally, as I was writing this section, Churchland and colleagues published new results from monkey electrophysiological experiments that support this idea (Churchland et al., 2012). They conducted simultaneous recordings of multiple neurons in the motor and premotor cortices while monkeys repeatedly reached in varying directions and at various distances. The collective activities of neuron fi rings were plotted into two- dimensional state space from their principal components, in the same way Churchland used before (see Figure 4.12). A nontrivial finding was that, after movement onset, the neural activation state exhibited a quasirotational movement in the same direction but with different phase and amplitude in the two- dimensional state space for each different case of reaching. The differences in the development of the neural activation state were due to differences in their initial state at the moment of movement onset. Churchland and colleagues interpreted this as follows: The preparatory activity sets the initial state of the dynamic system for generating quasirotational trajectories and their subsequent evolution produces the corresponding movement activity. Their interpretation is quite analogous to the idea Yamashita and I proposed: Motor programs might be represented in terms of the initial states of particular neural dynamical systems. The next section describes a robotics experiment pursuing this li ne of reasoning utilizing the MTRNN model.
Development of Functional Hierarchy for Action 209
9.2. Robotics Experiments on Developmental Training of Complex Actions This section shows how the MTRNN model can be used in humanoid robot experiment tasks on learning and generating skilled action.
9.2.1 Experimental Setup I conducted the following studies to investigate how a humanoid robot can acquire skills for performing complex actions by organizing a f unctional hierarchy in the MTR NN th rough interactive tutoring processes (Yamashita & Tani, 2008; Nishimoto & Tani, 2009). A small humanoid robot QRIO was trained on a set of object manipulation tasks in parallel through iterative guidance provided by a teacher. The robot could move its arms by activating joint motors with eight degrees of freedom (DOF) and was also capable of arm proprioception by means of encoder readings for these joints. The robot used a vision camera that could automatically track a color point placed in the center of the object. Therefore, reading the joint angles of the camera head (two DOF) represents visual sensory input corresponding to the object position. The robot was trained on three different tasks in sequence (shown in Figure 9.5), each of which Move up and down
Move left and right
Task
Home position
Move forward and back Touch by each hand
Back home to
Task 2
Touch by both hands
Rotate in the air
Task 3
Figure 9.5. A robot trained on three behavioral tasks, each of which is composed of a sequence of behavior primitives. After the third session, Task 3 was modified, as illustrated by the dotted lines. Adopted from Nishimoto and Tani (2009) with permission.
210
Emergent Minds: Findings from Robotics Experi ments
consisted of sequential combinations of different cyclic movement patterns of actions applied to the object. The training was conducted interactively in cycles of training sessions, meaning that t he arms were physically gu ided to follow adequa te trajectories while the robot attempted to generate its own trajectories based on its previously acquired skills. In this sense, it can be said that the actual training trajectories were “codeveloped” by the teacher and the robot. Through this physical guidance, the robot eventually perceived a continuous visuo- proprioceptive (VP) flow without explicit cues for segmenting the flow into primitives of movement patterns. In the course of developmental learning, the robot was trained gradually in the course of five sessions. During each session, all three tasks were repeated while introducing changes in the object position, and the network was trained with all training data obtained during the session. After each training session, offline training of the MTRNN was conducted by utilizing the VP sequences obtained in the process of guidance, in which the connection weights and the initial states of the intention units for all task sequences were updated. Subsequently, the performance of both the open- loop physical behavior and the closedloop motor imagery was tested for a ll three ta sks. Novel movement patterns were added to one of the task s during t he development process for the purpose of exami ning the capability of the network for incremen tal learning of new behavioral patter ns (see Task 3 in Figure 9.5). The employed MTRNN model consisted of 36 units with fast dynamics for vision and 144 units with fast dynamics for proprioception (τ = 1.0), 30 units with intermediate dynamics (τ = 5.0), and 20 units w ith slow dynamics ( τ = 70.0). The units with slow and intermediate dynamics were fully interconnected, as were all the units with fast and moderate dynamics, whereas the units with slow and fast dynamics were not connected directly. It was assumed that this kind of connection constraint would allow functional phenomena such as information bottlenecks or hubs to be developed in the subnetwork with intermediate dynamics. 9.2.2 Results The developmental learning of multiple goal-directed actions successfully converged after five training sessions, even in the case of Task 3, which was modified with the addition of a novel primitive pattern after
Development of Functional Hierarchy for Action 211
the third session. The developmental process can be categorized into several stages, and Figure 9.6 shows the process for Task 1 for the first three sessions. Plots are shown for the trained VP trajectories (left), motor imagery (middle), and actual output generated by the robot (right). The profiles for the units with slow dynamics in the motor imagery and the actual generated behavior were plotted for their first four principal components after conducting principal component analysis (PCA). In the first stage, which mostly corresponds to Session 1, none of the tasks were accomplished, as most of the actually generated movement patterns were premature, and the time evolution of the activations of the units with slow dynamics was almost flat. In the second stage, corresponding to Session 2, most of the primitive movement patterns were actually generated, showing some generalization with respect to changes in object position, although correct sequencing of them was not yet complete. In the third stage, corresponding to Session 3 and subsequent sessions, all tasks were successfully generated with correct sequencing of the primitive movement patterns and with good generalization with respect to changes in object position. The activations of units with slow dynamics became more dynamic compared with previous sessions in the case of both motor imagery and generation of physical actions. In summary then, the level responsible for organization of primitive movement patterns was developed during the earlier period, and the level responsible for the organization of these patterns into sequences developed in later periods. One important point I want to make here is that there was a lag between the time when the robot became able to generate motor imagery and the time when it started generating actual behaviors. Motor imagery was generated earlier than the actual behavior, as it was observed that the motor imagery for all tasks was nearly complete by Session 2, as compared to Session 3 in the case of actual generated behaviors. This outcome is in accordance with the arguments of some contemporary developmental psychologists, such as Kar miloff- Smith (1992) and Diamond (1991), who consider that 2-month-old infants already possess intentionality toward objects they wish to manipulate, although they cannot reach or grip them properly due to the immaturity of their motor control skills. Moreover, this developmental course of the robot’s learning supports the view of Smith and Thelen (2003) that development is better understood as the emergent product of many local interactions that occur in real time.
Figure 9.6. Development of Task 1 for the first three sessions with trained VP trajectories ( left ), motor imagery (middle), and actual generated behavior, accompanied by the profiles of units with slow dynamics, after conducting principal component analysis. (a) Session 1, (b) Session 2, (c) Session 3. Adopted from Nishimoto and T ani ( 2009) with permission.
Development of Functional Hierarchy for Action 213
Another interesting observation taken from thi s experiment was that the profiles of the training trajectories also developed across sessions. It can be seen that the training trajectories in Session 1 were quite distorted. Training patterns such as UD (moving up and down) in the first half and LR (moving left and right) in the second half did not form regular cycles. This is a typical case when cyclic patterns are taught to robots without using metronome-like devices. However, it can be seen that cyclic patterns in the training process became much more regular as the sessions proceeded. This is due to the development of limit-cycle attractors in the MTRNN that shaped the trajectories trained through direct guidance into more regular cyclic ones via physical interactions. This result shows a typical example of the codevelopment process undertaken by the robot and teacher whereby the robot’s internal structures develop via dense interactions between the top- down intentional generation for the robot’s movement and the bottom-up recognition of the teacher’s intention for guiding the robot’s movement. The interaction modifies not only the robot’s action but also the teacher’s. When I tried to guide physically the robot’s arms to move slightly differently from its own movement by grasping the arms, I became aware of its intention of persistence by some resistance force perceived in my hands. This modified my teaching intention and the resultant trajectory of guidance to some degree. In this sense, it can be said that the robot’s behavior trajectory and my teaching trajectory codeveloped during the experiment. Next, let’s see how neurodynami cs with d ifferent timescales successfully generates sets of action tasks consisting of multiple movement patterns. Figure 9.7 shows how the robot behaviors were generated in the test run after five training sessions. First, as can be seen in the first and second rows in Figure 9.7, VP trajectories for trained robots were successfully generated for all three tasks accompanied by changes in the cyclic movement patterns. Looking at the activation dynamics of the units with intermediate dynamics (shown in the fourth row) after conducting PCA, it is clear that their dynamics are correlated with the VP trajectories. However, the activation dynamics of the units with slow dynamics, which started from different initial states for each of the three tasks, developed to be uncorrelated with the VP or the trajectories of the units with intermediate dynamics (see the bottom row). Also, the profiles changed drastically as the movement patterns changed. However, the
(a)
UD
LR
(b)
FB
TchLR
(c)
BG
.0
.0
.0
0.8
0.8
0.8
h c 0.6 a e T 0.4
h c 0.6 a e 0.4 T
0.2 0.0
50
00 50 200 250 3 00
.0
0.0 0
0.2 0.0 0
50 00 50 200 250 300 350
.0
n o 0.8 ti ra 0.6 e n 0.4 0.2 e G 0.0
h c a 0.6 e T0.4
0.2 0
50
00 50 2 00 2 50 300
0.0 0 50 00 50 200 250 300 350
0
0.0 50
2
2
2
s it n u 0 w o l – S
tsi n u 0 w o l – S
tsi n u e t a i d e m r te n I
0
50
00 50 2 00 2 50 3 00
2 0 – –2
0
50
00 50 2 00 2 50 300
Step
–2 0
00 50 2 00 250 300
50 00 50 200 250 300 350
tsi n 2 u e t a i d 0 e m r – te –2 n I 0 50 00 50 200 250 300 350
Step
00 50 2 00 2 50 300
PC
–2 50
00 50 2 00 2 50 3 00
s it 2 n u te a i d 0 e m r – e t –2 n 0 I 50
00 50 2 00 2 50 300
0
Prop Prop2 Vision Vision2
n o it 0.8 a r 0.6 e 0.4 n 0.2 e G
st i n u 0 w o l – S –2
50
.0
n 0.8 io t ra 0.6 e 0.4 n e 0.2 G 0
RO
PC2 PC3 PC4
Step
Figure 9.7. Visuo-proprioceptive trajectories (two normalized joint angles denoted as Prop 1 and Prop 2 and the camera directio denoted as Vision 1 and Vision 2) during training and actual generation in session 5 accompanied by activation profiles of interm and slow units after principal component analysis, deno ted as PC 1– 4. (a) Moving up and down (UD) followed by moving left an right (LR) in Task 1, (b) moving forward and backward (FB) followed by touching by left hand and right hand (TchLR) in Task 2 (c) touching by both hands (BG) followed by rotating in air (RO) in Task 3. Adopted from Nishimoto & Tani (2009) with permi
Development of Functional Hierarchy for Action 215
transitions were still smooth, unlike in the case of gate opening or PB, which were accompanied by stepwise changes, as described in the previous section. Such drastic but smooth changes in the slow context profile were tailored by means of dense interactions between the top- down forward prediction and the bottom- up error regression. The bottomup error regression tends to generate rapidly changing profiles at the moment of switching, whereas the top-down forward prediction tends to generate only slowly changing profiles because of its large time constant. The collaboration and competition between the two processes result in such natural, smooth profiles. After enough training, all act ions are generated unconsciously because no prediction error is generated in the course of well- practiced trajectories unless encountered with unexpected events such as dropping the object. Further insight was obtained by observing how the robot managed to generate action when perturbed by receiving external inputs. In Task 1, the experimenter, by pulling the robot’s hand slightly, could induce the robot to switch action primitives from moving up and down to moving left and right earlier than four times cycling as it had been trained. This implies that counting at the higher level is more like an elastic dynamic process rather than a rigid logical computational one, which could be modulated by external inputs like being pulled by the experimenter. An interesting observation was that the action primitive of moving up and down was smoothly connected to the next primitive of moving the object to the left, which took place right after locating the object on the floor, even though the switch was made after incorrect times of cycling. The transitions never took place half way of ongoing primitive and were made at the same connection point as always regardless of incorrect times of cycling at the transit ion. This observation suggests that the whole system was able to generate action sequences with fluidity and flexibility by adequately arbitrating between the higher level that has been trained to count specific times before switching and the lower level that has been trained to connect one primitive to another at the same point. In the current observation, the intention from the higher level was elastic enough to give in for incorrect times of counting against the bottom-up force exerted by the experimenter, whereas the lower level was successful at connecting the first primitive to the second one at the same point as having been trained. Our proposed dynamic system’s scheme can allow this type of dynamic conflict resolution between different levels by letting them interact densely.
216
Emergent Minds: Findings from Robotics Exper iments
9.3. Summary This chapter was entirely dedicated to an examination of functional hierarchy by exploring the potential of the MT RNN model. The experimental results suggest that sequences of primitives are abstractly represented in the subnetwork consisting of units with slow dynamics, whereas detailed patterns of behavior primitives are generated in the subnetworks consisting of units with fast and intermediate dynamics. We can conclude that a sort of “fluid compositionality” for smooth and flexible generation of actions is achieved through self-organization of a functional hierarchy b y utilizi ng the timescale differences as well as the structu ral connectivity amon g different levels in the propo sed MT RNN model. These findings provide a possible explanation for how different functional roles can be assigned in different regions in brains (i.e., the PFC for creating abstract actional plans and the parietal cortex for composing sensory– motor details). Such assignments in the brain may not be tailor made by a genome program, but result as a consequence of self-organization via development and learning under various structural constraints imposed on the anatomy of the brain, including connectivity among local regions with bottlenecks and timescale differences in neuronal activities. This can be accounted by a well-known conception in complex adaptive systems, known as downward causation (Campbell, 1974; Bassett & Gazzaniga, 2011) which denotes causal relationship from the global to local parts. It can be said that the functional hierarchy emerges by means of the upward causation in terms of collective neural activity both in the forward activation dynamics and the error back-propagation which are constrained by the downward causation in terms of timescale difference, network topology, and environmental interaction. The observed fluid compositionality that has been metaphorically expressed as “kinetic melody” by Luria should be resulted from this. It was also shown that the capability of abstraction t hrough hierarchy in M TRNN can provide robots with competency of self-narrative for own actional intention via mental simulation. Reflective selves of robots may start from this point. Readers may ask a crucial question. Can the time constant parameters in the MTRNN be adapted via learning, or do they have to be set by the experimenters as in the current version? Hochreiter and Schmidhuber (1997) proposed the “long-term and short- term memory” RN N model,
Development of Functional Hierarchy for Action 217
which is characterized by its dynamic memory mechanism implemented in “memory cells.” A memory cell can keep its current dynamic state for arbitrarily long time steps without specific parameter setting by means of its associated adaptive gate opening–closing mechanisms learned via the error back-propagation scheme. If the memory cells were allocated in multiple levels of subnetworks, it would be interesting to examine whether a functional hierarchy can be developed by organizing the long-term memory in the higher level and the shorter memory in the lower level. Actually, the MTRNN model was srcinally developed with a time- constant adaptation mechanism by using a genetic algorithm (Paine & Tani, 2005). Simulation experiments on robot navigation learning using this model showed that a functional hierarchy for navigation control of the robot was developed by evolving slower and faster dynamics str uctures between t wo levels of the subnetworks, provided that a bottleneck connection is prepared between them. Some may consider that brains should involve also a spatial hierarchy as having been evidenced in the accumulated studies on the visual recognition pathway (see section 4.1). Also, Hasson and colleagues (Hasson et al., 2008) suggested development of spatio-temporal hierarchy in human visual cortex. In response to this concern, our group (Jung et al., 2015; Choi & Tani, 2016) has recently shown that a spatio-temporal hierarchy can be developed successfully in a neurodynamic model referred to as multiple spatio-temporal neural network (MSTNN) for the recognition as well as generation of compositional human action sequence patterns, as represented in pixel level video images when both spatial and temporal constraints are applied to neural activation dynamics in multiple scales for different levels. Furthermore, MSTNN and MTRNN has been integrated in a simulated humanoid robot platform (Figure 9.8) by which the simulated robot becomes able to generate object manipulation behaviors corresponding to visually demonstrated human gesture via end-to-end learning from the video image inputs to the motor outputs (Hwang et al., 2015. As the result of end-to-end learning for various combinations of the gesture patterns and the corresponding motor outputs for grasping different shape of objects, it was found that the intentions for grasping different objects can be developed in the PFC subnetwork characterized by its slowest time scale in the whole network. Going back to the robotics experiment using the MTRNN, we observed that actions could be generated compositionally depending on
218
Emergent Minds: Findings from Robotics Exper iments (b) Higher level (PFC) Very slow
g in
C
(a)
m u h
an
u st ge
In
re
MSTNN Slow
riz go ta e
te n ti o n
MTRNN Slow
to
m an ip ul a te s p
MSTNN Fast
MTRNN Fast
ec ifi ed
o b je ct
Motor output
Dynamic vision Inout (VI)
Attention control
time
Figure 9.8. A simulated humanoid robot learns to generate object manipulation beha viors as specified by human gesture demonstrated to the robot by video image.processing (a) Task space and (b) N thefor integrated MSTN N for video image and MTRN dynamic model motor of pattern generation.
the initial states of the intention units. However, this naturally poses the question of how the initial state is set (Park & Tani, 2015). Is there any way that the initial state representing the intentionality for action could be self-determined and set autonomously rather than being set by the experimenter? This issue is related to the problem of the origin of spontaneity or free will, as addressed in section 4.3. The next chapter explores this issue by examining the results from several synthetic robotics experiments while drawing attention to possible correspondences with the experimental results of Libet (1985) and Soon and colleagues (200 8).
10 Free Will for Action and Conscious Awareness
We first explore how intentions for actions can be generated spontaneously in higher cognitive brain areas by reviewing our robotics experiments. As I wrote in sec tion 4.3, Libet (1985) demonstrates that awareness of intention is delayed, a result later confirmed by Soon and colleagues (20 08). Later sections investigate this problem b y clarif ying causal relationships shared by free will and consciousness.
10.1. A Dynamic Account of Spontaneous Behaviors Although we may not be aware of it, our everyday life is full of spontaneity. Let’s take the example of the actions involved in making a cup of instant coffee, something we are all likely to be very familiar with. After I’ve put a spoonful of coffee granules in my mug and have added hot water, I usually add milk and then either add sugar or not, which is rather unconsciously determined. Then frequently, I only notice later that I actually added sugar when I take the first sip. Some parts of these action sequences are defined and static―I must add the coffee granules and hot water―but other parts are optional, and this is where I can see
219
220
Emergent Minds: Findings from Robotics Exper iments
spontaneity in the generation of my own actions. A similar comment can be made about improvisations in playing jazz or in contemporary dance, where musical phrases or body movement patterns are created freely and on the spot in an unpredictable manner. It seems that spontaneity appears not within a chunk but at junctions between chunks in behavior streams. Chunks are behavior primitives, such as pouring hot water into a mug or repeating musical phrases, which are presumably acquired through practice and experience, as I have mentioned many times already. Junctions between behavior primitives are weaker relationships than within primitives themselves because junctions appear less frequently than primitives in repeated behavioral experiences. Actually, psychological observations of child development as well as adult learning have suggested that chunk structures can be extracted through st atistical learning with a suf ficiently large number of perceptual and behavioral experiences (e.g. Klahr et al., 1983; Saffran et al., 1996; Kirkham et al., 2002; Baldwin et al., 200 8). Here, the term “chunk structures” denotes repeatable patterns of action sequences as unified “chunks” and takes into account the probabilistic state transitions between those chunks, “ junctions.” One question essential to the problem of free will arises. How is it that subsequent chunks or behavior primitives can be considered to be freely selected if one simply follows a learned statistical e xpectation? If we consider someone who has learned that the next behavior primitive to enact in a certain situation is either A or B, provided that past experience defines equal probabilities for A and B, it is plausible that either of the primitives might be enacted, so there is at least the apparent potential for freely chosen action in such instances. However, following the studies by Libet (1985) and Soon and colleagues (2008) discussed in section 4.3, voluntary actions might srcinate from neural activities in the supplementary motor area, prefrontal cortex, or parietal cortex, and in no case are these activities accompanied by awareness. Thus, even though one might believe that the choice of a particular action from among multiple possibilities (e.g., primitives A, B, and C) has been entirely conscious, in fact this apparently conscious decision has been precipitated by neural activity not subject to awareness, and indeed free will seems not so freely determined, at all. Our MTRNN model can account for these results by assuming that neural activities preceding apparently freely chosen actions are represented by the initial states of the intentional units located in the
Free Will for Action and Conscious Awareness 221
network with slow dynamics. However, this explanation generates further questions: (1) how are the values of the initial states set for initiating voluntary actions, and (2) how can conscious awareness of the decision emerge with delay? To address these problems, my colleagues and I conducted some neurorobotics experiments involving the statistical learning of imitative actions (Namikawa et al., 2011). The following experimental results highlight the role of cortical itinerant dynamics in generating spontaneity. 10.1.1 Experiment A humanoid robot was trained to imitate actions involving object manipulation though direct guidance by an experimenter. The setup used for the robot and the way its movements were guided were the same as in our experiment described section 9.2 (and in Yamashita and Tani, 2008). The target actions to imitate are shown in Figure 10.1. The target task to be i mitated included stochastic transitions between primitive actions. The object was located on the workbench in one of three positions (left, center, or right), and the experimenter repeated primitive actions that consisted of picking up the object, moving it to one of the other two possible positions, and releasing it by guiding the hands of the robot while deciding the next object position randomly with equal probability (50%). This process generated 24 training
(a)
RighttoLeft(50%) Right to Center (50%)
(b) .0
Left to Center (50%) l a ic t r e V
Center to Right (50%)
Center to Left (50%)
Left to Right (50%)
–.0 –.0
Horizontal
.0
Figure 10.1. Object manipulation actions to be imitated by a Sony humanoid robot. (a) The task consist s of stochas tic transitions between pr imitive actions: moving an object to one of two possible positions with equal probability after reaching and grasping it. (b) Trajectory of the center of mass of the object as observed by using the robot’s vision system. Adopted from Namikawa et al. (2011) with PLoS Creative Commons Attribution (CC BY) license.
222
Emergent Minds: Findings from Robotics Experi ments
sequences, each of which consisted of 20 transitions between primitive actions, amounting to about 2,500 time steps of continuous visuoproprioceptive sequences. The time constants of the employed MTRNN were set to 100.0, 20.0, and 2.0 for units with slow, intermediate, and fast dynamics, respectively. It is noted that in this experiment the lower level was assembled with a set of gated RNNs (Tani & Nolfi, 1999) that interacted directly with the visuo- proprioceptive sequences. The intermediate subnetwork controlled the gate opening by the outputs. After the offline trai ning of the network, the robot was tested on imitating (generating) each training sequence by setting it to the acquired initial s tate. Although the trai ned primitive action sequen ces were reproduced exactly during the initial period consisting of several primitive action transitions, the sequences gradually started deviating from the learned ones. This was considered to be du e to the sensitivity to the initial conditions in the t rained network. Statistical analysis conducted on the transition sequences generated over longer periods showed that the probabilities with which the transitions between the primitive actions were reproduced were quite similar to the ones to which the robot was exposed during the trai ning period. The same analysis was repeated for cases of dif ferent transition proba bilities of the t arget actions. When the transition probabilities for some of the target actions were changed to 25% and 12.5%, the same proportion of corresponding sequences were newly generated in each case. An analysis of the sequences produced by the trained network for each case showed that the transition probabilities of the reproduced actions mostly followed the target ones, with deviations of only a few percent. These results imply that the proposed model, although unable to learn to imitate the long visuo-proprioceptive sequences exactly, could ext ract the st atistical s tructures (chunks) with their corresponding tran sition probabilities f rom these sequences. Let’s now examine the main issue in this context, namely the origin a nd indeterminacy of spontaneity in choosing subsequent primitive actions. Here, one might assume that the prevailing opinion is that spontaneity is simply due to noise in the (external) physical world, which induces transitions between primitive actions represented by different attractors. The following experiment, however, shows that this is not the case. Furthermore, in examini ng whether the same statistical reproduction could also be observed in the case of motor imagery rather than actual motor action, it turned out that the answer is affirmative. This turns out to be quite important because, as motor imagery is generated
Free Will for Action and Conscious Awareness 223
determinis tically in offline si mulation without any con tamination from external sensory noise, the observed stochastic properties should be due to some internally generated fluctuations rather than noise-induced perturbations. In other words, the spontaneity observed at junctions between chunks of action sequences seems to arise from within the robots and by way of processes perfectly consistent with results from Libet and Soon. To gain some insight into this phenomenon, let’s look at the neural activation sequences in units with different timescales associated with the visuo-proprioceptive sequences during the generation of motor imagery, as shown in Figure 10.2.
Primitive action
Vision
R
C
R
L
C
R
L
C
n io t a iv t c A
– Proprioception
n o it a iv t c A
– Intermediate dynamics network
ID it n U
30 Slow dynamics network
ID ti n U
30 0
Time steps
000
Figure 10.2. Time evolution of neural activities associated with visuoproprioceptive sequences in motor imagery. Capital letters shown in the first panel denote primitive actions executed (R: moving to right, L: moving to left and C: moving to center). Plots in the first panel and in the second panel show predicted vision and proprioception outputs, respectively. Plots in the third and fourth panels show, with different shades of gray, the activities of 30 neural unit s in the subnetworks with intermediate and slow dynamics, respectively. Adopted from Namikawa et al. (2011) with PLoS Creative Commons Attribution (CC BY) license.
224
Emergent Minds: Findings from Robotics Ex periments
It can be seen that the neural activities in the subnetworks with intermediate and slow dynamics develop with their intrinsic timescale dynamics. In the plot of intermediate neural activity, it can be seen that its dynamic pattern repeats for the same action primitive generated. On the other hand, in the plot of slow dynamics one, neither of such apparent regularity nor apparent repeated patterns of activity can be observed. To examine the dynamic characteristics of the networks, a dynamic measure known as the Lyapunov exponent was calculated for the activity of each subnetwork. The Lyapunov exponent is a multidimensional vector that indicates the rate of divergence of adjacent trajectories in a given dynamic system. If the largest component of this vector is positive, this indicates that chaos is generated by means of the stretching and folding mechanism described in section 5.1. In the analysis, it was calculated that the maximum Lyapunov exponent was positive for the subnetwork with slow dynamics and negative for the subnetworks with intermediate and fast dynamics. The results were repeatable for different runs of training of the network, implying that chaos emerged in the subnetwork with slow dynamics but not in the other subnetworks. Therefore, deterministic chaos emerging in the subnetwork with slow dynamics might affect the subnetworks with intermediate and fast dynamics, generating pseudostochastic transitions between primitive action sequences. Readers may see that this result corresponds exactly with the aforementioned idea (illustrated in Figure 6.2), that chaos in the higher level network can drive compositional generation of action primitives, which are stored in the lower level as well as with what Braitenberg’s Vehicle 12 predicted in section 5.3.2. To clarif y the functional role of each subnetwork, we next conducted an experiment involving a “ lesion” artificially created in one of the subnetworks. The trajectory of the manipulated object generated as visual imagery by the srcinal intact network was compared with the one generated by the same network but with a lesion in the subnetwork with slow dynamics (Figure 10.3). A complex trajectory wandering between the three object positions was generated in the case of the intact network, whereas a simple trajectory of exact repetitions of moving to left, to right, and to center was generated in the case of the “lesion” in the slow dynamics subnetwork. This implies that the lesion in the subnetwork with slow dynamics deprived the network of the potential to spontaneously combine primitive actions.
Free Will for Action and Conscious Awareness 225 (a)
(b)
–
– –
–
Figure 10.3. Comparison of behaviors between an intact network and a “lesioned” network. Trajectories of the manipulated object (a) generated as visual imagery by the srcinal intact network and (b) generated by the same network but with a “lesion” in its subnetwork with slow dynamics. Adopted from Namikawa et al. (2011) with PLoS Creative Commons Attribution (CC BY) license.
10.1.2 Origin of Spontaneity The results of the robotics experiments described so far suggest a possible mechanism for generating spontaneous actions and their images in the brain. It is assumed that deterministic chaos emerging in the subnetwork with slow dynamics, possibly corresponding to the prefrontal cortex, might be responsible for spontaneity in sequencing primitive actions by destabilizing junctions in chunk structures. This agrees well with Freeman’s (2000) speculation that intentionality is spontaneously generated by means of chaos in the prefrontal cortex. The isolation of chaos into the prefrontal cortex would make sense because the robustness of generation of physical actions would be lost if chaos governs the whole cortical region. Also, such isolation of chaos in the higher level of the organized functional hierarchy in the brain m ight afford the establishment of two competencies essential to cognitive agency, namely free selection and combination of actions, and their robust execution in an actual physical environment. Our consideration here is analogous to William James’ consideration for the mechanism of free will, as was illustrated in Figure 3.4. He considered that multiple alternatives can be regarded asaccidental generations with spontaneous variation from memory consolidating various experiences, in which one alternative is eventually selected as the next action.
226
Emergent Minds: Findings from Robotics Exper iments
Chaos present at a higher level of the brain may account for this “accidental” generation with spontaneous variation. Also, his metaphoric reference to substantial parts as “perchings” and transient parts as “flights” in theorizing the stream of consciousness might be analogous to the chunk structures and their junctions apparent in the robotics experiments described in section 3.5. What James referred to as intermittent transitions between these perches and flights might also be due to the chaosbased mechanism discussed here. Furthermore, readers may remember the experimental results of Churchland and colleagues (2010) showing that the low-dimensional neural activity during the movement preparatory period exhibits greater fluctuation before the appearance of the target and a more stable trajectory after its appearance. Such fluctuations in neuronal activity, possibly due to chaos srcinating in higher levels of organization, might facilitate the spontaneous generation of actions and images. Here, one thing to be noted is that wills or intentions which are spontaneously generated by determin istic chaos are not really “ freely” generated because they are generated by foll owing the determin istic causality of internal states. They may look as if generated with some randomness, because the true internal state is not consciously accessible. If we observe action sequences in terms of categorized symbol sequences, they turn out to be probabilistic sequences as explained by symbolic dynamics (see section 5.1). Mathematically speaking, complete free will without any prior causality may not exist. But, it may feel as if free will exists when one has a limited awareness of underlying casual mechanisms. Now, I’d like briefly to discuss the issue of deterministic dynamics versus probabilistic processes in modeling spontaneity. The uniqueness of the current model study lies in t he fact that determinist ic chaos emerges in the process of imitating probabilistic transitions of action primitives, provided that sufficient training sequences are used to induce generalization in learning. This result can be understood as a reverse of the ordinary way of constructing t he symbolic dynamic in which deterministic chaos produces probabilistic transitions of symbols as shown in chapter 5. The mechanism is also analogous to what we have seen about the emergence of chaos in conflicting situations encountered by robots, as described in section 7.2. We might be justified in asking why models of deterministic dynamic systems are considered to be more essential than models of stochastic processes, such as Markov chains (Markov, 1971). A fundamental
Free Will for Action and Conscious Awareness 227
reason for this preference is that models of deterministic dynamical systems more closely represent physical phenomena that take place in continuous time and space, as argued in previous sections. In contrast, Markov chain models, which are the most popular schemes for modeling probabilistic processes, employ discrete state representations by partitioning the state space into substates. The substates are assigned nodes with labels, and the possible state transitions between t hose states are denoted by arcs, as in the case of a finite- state machine (FSM). The only difference with an FSM is that arcs represent transition probab ilities rather than deterministic paths. In such a discretization scheme, even a slight mismatch between the current state of the model and any inputs from the external environment can result in a failure to match. When inputs with unexpected labels arrive, Markov chain models just halt and refuse to accept the inputs. On the other hand, at the very least, dynamical system models can avoid such catastrophic events as their dynamics develop autonomously. The intrinsic fuzziness in representing levels, primitives, and intentions in dynamical system models, such as MTRNN, could develop robustness and smoothness in interactions with the physical world. 10.1.3 Creating Novel Action Sequences My colleagues and I investigated the capability of MTRNNs in generating diverse combinatorial action sequences by means of chaos developed via the tutored learning of a set of trajectories. In such experiments, we often observed that MTRNNs generated novel movement patterns by combining prior learned segments in mental simulation as well as in actual behaviors (Arie et al., 2009; Arie et al., 2012). In one such humanoid robot experiment involving an object manipulation task, we employed an extended MTRNN model that can cope with dynamic visual images at pixel levels (Arie et al., 2009). In this extension, a Kohonen network model was used for preprocessing of the pixel level visual pattern similar to the model described in section 7.2. The pixel pattern received at each step was fed into the Kohonen network as a two-dimensional topological map and the low-dimensional winnertake-all activation pattern of the Kohonen network units was input to the MTRNN, and the output of the MTRNN was fed into the Kohonen network to reconstruct the predicted image of the pixel pattern. In training for the object manipulation task, the robot was tutored on a set
228
Emergent Minds: Findings from Robotics Exper iments
of movement sequences for manipulating a cuboid object by utilizing the initial sensitivity characteristics in the slow dynamics context units (Figure 10.4). The tutored sequences were started from two different initial conditions in which one initial condition was set as an object (a small block) that stood on the base in front of a small table (a large block). From this initial condition, the standing object was either moved to the right side by pushing, put on the small table by grasping, or laid down by hitting. The other initial condition was set as the same object laid on the base in front of the table. Then the object was either moved to the right or was put on the small table. The tutoring was repeated while the position of the object in the initial condition was varied. After learning all tutored sequences, the network model was tested on the generation of visual imagery as well as actual action. It was observed that the model could generate diverse visual imagery sequences, or hallucinations, both for physically possible ones and impossible ones depending on the initial slow context state
Time
Object standing on base (Initial condition )
() Move standing object to right
(2) Put standing object on table
(3) Lay down standing object
Object laid on base (Initial condition 2)
(4) Move laid object to right
(5) Put laid object on table
Figure 10.4. A humanoid robot tutored on five different movement sequences star ting from two di fferent initial conditions of a manipulated object. Adopted from Arie et al. (20 09) with permiss ion.
Free Will for Action and Conscious Awareness 229
(representing the intention). For example, as for the physically possible case, the network generated an image of concatenating a partial sequence of laying down the standing object on the base and that of graspi ng it to put on the small t able. On the other hand, an example of a physically impossible case involved a slight modulation of the aforementioned possible case. This impossible case involved laying down the standing object and then grasping the lying down object to put it on the small table standing up. Although it is physically impossible that the lying object suddenly stands up after being put on the table, this strange hallucination appeared because the prior learned partial sequence pattern of grasping the standing object and putting it on the table were wrongly concatenated in the image. In the test of actual action generation, t he aforementioned physically possible one was successfully generated as shown in Figure 10.5. The experimental results descr ibed here are analo gous to the results obtained by using an R NNPB model. In section 8.2, it was shown that various action sequences including novel ones were generated by changing the PB values in the RNNPB. In the current case using MTRNN, diverse sequential combinations of movement primitives including novel combinations were spontaneously generated by means of chaos or transient chaos organized in the higher level network. It can be said that these robots using RN NPB or MT RNN generated something novel by trying to avoid simply falling into own habitual patterns. It is noted again that novel images can be found in the deep memory developed with relational structure among experienced ones through long- term consolidative learning. An analogous observation has been obtained in robotics experiment using MTRNN on learning to generate compositional action sequences corresponding to observation of compositional
Time
Figure 10.5. The humanoid robot generated an action by spontaneously concatenating prior learned two-movement sequences of laying down the standing object on the base and grasping it to put it on the small table. Adopted from Arie et al. (200 9) with permission.
230
Emergent Minds: Findings from Robotics Exper iments
gesture patterns (Park & Tani, 2015.) It was shown that novel action sequences can be adequately generated as corresponding to observation of unlearned gesture pattern sequences conveying novel compositional semantics after consolidative learning of the tutored exemplar which did not contain all possible combination patterns. * * * This is not the end of the story. An important question still remains unanswered. If we consider that the spontaneous generation of actional intentions mechanized by chaos in the PFC is the srcin of free will, why is the awareness of a free decision delayed, as evidenced by Libet’s (1985) and Soon’s (2008) experiments? Here, let us consider how we recognize our own actions in daily life. In the very beginning of the current chapter, I wrote that, after adding coffee granules and hot water , I “either add s ugar or not, which i s rather unconsciously determined” and then “only notice later that I actually added sugar when I t ake the fi rst sip.” Indeed, in many situat ions one’s own intention is only consciously recognized when confronted with unexpected outcomes. This understanding, moreover, had led me to develop a further set of experiments clarifying the structural relationships between the spontaneous generation of intention for action and the conscious awareness of these i ntentions by wa y of the resu lts of said actions. The next section reviews this set of robotics experiments, the last one in this book.
10.2. Free Will, Consciousness, and Postdiction This final section explores possible mechanisms accounting for the awareness of one’s own actional intentions examining cases conflictive interactions taking place between thebyself and others in aofrobotics experiment. The idea is essentially this. In conflicting situations, spontaneously generated intentions are not completely free but modified so that the conflict can be reduced, and it is in this interplay that consciousness arises. To illustrate these processes, we conducted a simple robotics experiment. Through the analysis of the experimental result, we attempt to explain why free will can become consciously aware with delay immediately before the onset of the actual action.
Free Will for Action and Conscious Awareness 231
10.2.1 Model and Robotics Experiment In this experiment (Murata et al., 2015), two humanoid robots were used in which a robot, referred as the “self robot,” was controlled by an extended version of MTR NN and the other robot, referred as the “other robot” was teleoperated by a human experimenter. At each trial, after the right hand of the “other robot” was settled in the center position for ahand moment, commanded robot to (by move the eitherthe to human the leftexperimenter “L” or right “R” direction the at random using a pseudo random generator). Meanwhile the “self robot” attempted to generate the same movement simultaneously by predicting the decision made by the “other robot.” This trial was repeated several times. In the learning phase, the “self robot” was trained to imitate the random action sequences of either moving left or right demonstrated by the “other robot” through visual i nputs. Because thi s part of the robot training is analogous to the one described in the last section, it was expected that the robot could learn to imitate the random action sequences by developing chaos in the slow dynamics network of the MTRNN. In the test phase for interactive action generation with the “other robot,” the “self robot” was supposed to decide to move its hand either left or right spontaneously at each juncture. However, at the same time, it had to follow the movement of the “other robot” by modifying its own intention when its decision conflicted with the “other robot.” It is worth noting here that the chance of conflict is 50% because moving either left or right by the “other robot” is determined randomly. Under the aforementioned task condition, we examined possible interactions between the top- down process for spontaneously generating actional intention and the bottom-up process for modifying the intention by recognizing the perceptual reality by means of the error regression mechanism in the conflictive situation. The error regression was applied for updating the activation states of context units in the slow dynamics network over a specific time length of the regression window in the immediate past. Specifically, the prediction errors for the visual inputs for l steps in the immediate past were back-propagated through time for updating the activation values of context units at the – lth step in the slow dynamics network tow ard minim izing those er rors. This update reconstructs new image sequence in the regression window in the immediate past as well as prediction of future sequence by means of the forward dynamics in the whole network. This was all done using a
232
Emergent Minds: Findings from Robotics Exper iments
realization of the abstract model proposed in chapter 6 (see Figure 6.2) that can perform regression of immediate past a nd prediction of future simultaneously in online. The test for robot action generation was conducted in comparison between two conditions, namely with and without using the error regression scheme. Figure 10.6a and b show examples of the robot trials with open- loop one-step prediction, as observed in the experi ments both without and with using the error regression, respectively. Both cases were tested wit h the same conflictive situation wh erein the i ntention of the “self robot” in terms of the initial state of the slow context units was set so that an action sequence LLRRL was anticipated while the “other robot” actually generated an action sequence RRLLR. The profiles of one-step sensory prediction (two representative joint angles of the “self robot” and two- dimensional visual inputs representing the hand position of the “other robot”) are shown in the first row, the online prediction error is shown in the second row, and the slow context and fast context activity are shown in the third and fourth rows, respectively. The dotted vertical lines represent the decision points. It was observed that the case of one-step prediction without using the error regression was signi ficantly poorer as comp ared with t he one with the error regression. In fact, the prediction error became significantly large at the decision point. In this situation, the movement of the “self robot” became erratic. Although the “self- robot” seemed to try to follow the movements of the “other robot” by using the sensory inputs, its movements were significantly delayed. Furthermore, at the point of the fourth decision the “self robot” moved its arm in the direction opposite to that of the other robot (see a cross mark in Figure 10.6 a.) It seems that the “self robot” cannot adapt to the ongoing conflictive situation just by means of the sensory entrain ment, because its top- down intention is too strong to be modified. In contrast, one- step prediction using the error regression was quite successful by generating only a spikelike momentary error even at a conflictive decision point (see Figure 10.6 b.) These results suggest t hat the error regression mechanism is more effective for achieving immediate ada ptation of the internal neural st ates to the current situation than the sensory entrainment mechanism. Now, we examine how the neural activity represents the perceptual images of past, present, and future as associated with current intention and how such image and intention can be modulated dynamically through iterative interactio ns between t he top- down intentional process
Free Will for Action and Conscious Awareness 233 (a) y r so n e s
(b) n ito c i d0 e r p–
R
0
R
00
L
R
200
R
300
y r o s n se 400
r Q o S rr0.4 M e
R
L
LR
0
00
200
300
400
0
00
200
300
400
0
00
200
300
400
0
00
200
300
400
r Q o S rr .4 M e 0
00
200
300
400
tiy w v ti 0 lo s c a –
R
.8
0.8
0.0
n o ti c i 0 d e r – p
.0 y t w iiv 0 lso tc a –
0
00
200
300
400
y it st iv fa tc 0 a
y ts itv i 0 fa tc a –
– 0
00
200
300
time step
400
time step
Figure 10.6. The results of the self- robot interacting with the other robot by the open-loop generation without (a) and with (b) the error regression mechanism. Redrawn from Murata et al. (2015).
and the bottom-up error regression process while online movement of the robot. Figure 10.7 shows plots for the neural activity at several now steps— the 221st step, the 224th step, and the 227th step from left to right—in an event when the prediction was in conflict with immediate sensory input. The plots for the sensory prediction (joint angles and visual inputs), the prediction error, the slow context unit activity, and the fast context unit activity are shown from the first row to the fourth row, respectively. They show profiles for the past and for the future, with the current step of now sandwiched between them. The prediction error is shown only for the past, naturally. The regression window is shown as a shaded area in the immediate past. The hand of the “self robot” started to move once to the right direction around the 215th step after settling in the home position for a moment (see the leftmost panels). It is noted that although the joint angles of the “self robot” were settled, t here were dynamic chang es in the activity of fast context units. This dynamic activity prepares a bias to move the hand to a particular direct ion, which was the right direct ion in this case. Also, it can be seen that the error arose sharply in the immediate past when the current “now” was at the 221st step. At this moment, the prediction by the “self robot” was betrayed because the hand of the “other
Regression window
y r o s n e S
n io t ic d e r p
Now (step = 22)
Past
Now (step = 224)
Plan
0
0
–
80 n ito c i d e r P
t x e t n o c t s a F
220
240
200
220
240
260
–
0.8
0.8
0.8
0.4
0.4
st i n u
200
220
240
260
0.0 80
200
220
240
260
0
0
0
80
200
220
240
260
–
80
200
220
240
260
–
0
0
0
–
– 80
200
220
240
260
Plan
80
200
220
240
260
80
200
220
240
260
80
200
220
240
260
80
200
220
240
260
0.0
–
s it n u
80
Past
0
Overwritten past 80
260
Plan modulated
r o re0.4
0.0 t x te n o c w lo S
200
Past
Now (step = 227)
– 80
200
220
240 260
Time step
Figure 10.7. The rewriting of futu re by prediction and past by postdiction in the case of conflict. Profiles of sensory prediction, prediction error, and activations of slow and fast context units are plotted from past to future for different current “now” steps. T current “now” is shifted from the 221st step in the left panels, the 224th step in the center panels, and the 227th step in the righ panels. Each panel shows profiles corresponding to the immediate past (the regression window) with solid lines and to the future dotted lines. Redrawn from Murata et al. (2015).
Free Will for Action and Conscious Awareness 235
robot” moved to the left. Then, the error signal generated was propagated upstream strongly and the slow context activation state in the starting step of the regression window was modified with effort. Here, we can see discontinuity in the profiles of the slow context unit activity at the onset of the regression window. This modification caused the overwriting of all profiles of the sensory prediction (reconstruction) and the neural activity in the regression window by means of the forward dynamics recalculated from the onset of the window (see the panels of the current “now” at the 224th step.) The profiles for future steps were also modified accordingly while the error was decreased as the current “now” shifted to the 224th and to the 227th steps. Then, the arm of the “self robot” moved to the left. What we have observed here is postdiction1 for the past and prediction for the future (Yamashita & Tani, 2012; Murata et al., 2015) by which one’s own action can be recognized only in a “postdictive” manner when one’s own actional intention is about to be rewritten. This structure reminds us of Heidegger’s characterization of the dynamic interplay between looking ahead to the future for possibilities and regressing to the conflictive past through reflection where vivid nowness is born (see section 7.2.) Surely, at this point the robot becomes self-reflective for own past and future!! Especially, the rewritten window in our model may correspond to the encompassing narrative history as space of time in its thought. Thus, we are led to a natural inference, that people may notice their own intentions in the specious present when confronted with conflicts that must be reduced, with the effort resulting in conscious experience. 10.2.2 Interpretation Can we apply the aforementioned analysis to account for the delayed awareness of free will? The reader may assume that no conflict should be encountered in just freely pressing a button as in the Libet experiment. However, our experiments show how conflicts might arise due to the nature of embodied, situated cognition. When an intention unconsciously developed in the higher cognitive level by deterministic chaos
1. Postdiction is known as perceptual phenomena in which a stimulus presented later affects the perception of another stimulus presented earlier (e.g., Eagleman & Sejnowski, 20 00; Shimojo, 2014).
236
Emergent Minds: Findings from Robotics Exper iments 3. Embodiment entails certain amount of error.
4. Intention modulated by error conscious
Motor signal
Proprioception Error
. Spontaneous generation of intention by chaos in PFC.
M Error
Prediction of proprioception
Parietal t 2. Intention drives lower level.
Figure 10.8. Account for how free will can be generated unconsciously and how one can become consciously aware of it later.
exceeds a certain threshold, it attempts to drive the lower peripheral parts to generate a particular movement abruptly (see Figure 10.8). However, the lower levels may not be able to respond to this impetus immediately because the i nternal neural activity in the peripheral areas, including muscle potential states, may not be always ready to initiate physical body movements according to top-down expectations. It is like when a locomotive suddenly starts to move, the following freight train cars cannot follow immediately, and the wheels spin as the system overcomes resistance to new inertia. As the wheels spin, the engineer may slow the engine speed to optimize the acceleration and get the train going properly. Likewise, in terms of the preceding experimental model, when higher levels cannot receive exactly the expected response from lower levels, some prediction error is generated, which can call for a certain modification of the intention for the movement in the direction of minimizing the error. Here, when the intention for the movement that has been developed unconsciously is modified, conscious awareness arises. This consciously aware intention is di fferent from the srci nal unconscious one because it has been already rewritten by means of postdiction. In short, if actions can be generated automatically and smoothly as intended exactly in the beginning, they are not accompanied by consciousness. However, when they are generated in response to conflicts arising due to the nature of embodiment in the real world, these actions are accompanied by consciousness. This interpretation of our
Free Will for Action and Conscious Awareness 237
experimental results is analogous to the aforementioned speculation made by Desmurget and colleagues (2009) (see section 4.3) that the parietal cortex might mediate error monitoring between the predicted perceptual outcome for the intended action and the actual one, a process through which one becomes consciously aware. Freeman (2000) also pointed out that action precedes conscious decision, referring to Merleau-Ponty: In reality, the deliberation follows the decision—and it is my secret decision that brings the motives to life (Merleau-Ponty, 1962, p 506). On this account, the relationship between free will and consciousness can be accounted for in the following way; (1) deterministic chaos is developed in the higher cognitive brain area; (2) the top-down intention is spontaneously fluctuated by means of the chaotic dynamics without accompanying consciousness; (3) at the moment of initiating a physical action as triggered by this fluctuated intention, prediction error is generated between the intended state and the reality in the external world; (4) the intention, which has been modified by means of the error regression (postdiction), becomes consciously noticed as the cause for the action about to be generated. In terms of human cognition, then, we may say that consciousness is the feeling of one’s own embodied neural structure as it physically changes in adaptation to a changing, unpredictable or unpredicted external environment. If considered as just discussed, Thomas Hobbes (section 3.6) might be right in saying that there is no space left for free will because every “free” action is determined through deterministic dynamics. However, the point is that our conscious minds cannot see how they develop deterministically through causal chains in unconscious processes, but only notice that each freeaction seems to pop out all of a sudden without any cause. Therefore, we feel as if our intentions or wills could be generated freely without cause. To sum up, my account is that free will exists phenomenologically, whereas third-party observation of the physical processes underlying its appearance tells a different stor y. 10.2.3 Circular Causality, Criticality, and Authenticity I explored further possibility for appl ying the MT RNN model extended with the error regression mechanism to a scenario of incremental and
238
Emergent Minds: Findings from Robotics Ex periments
interactive tutoring, because such a venture looked so fascinating to me. When I taught a set of movement sequences to the robot, the robot generated various images as well as actual actions by spontaneously combining these sequences (this is analogous to the experiment results shown in section 10.1.) While the robot generated such actions, I occasionally interacted with the robot in order to modify its ongoing movement by grasping its hands. In these interactions, the robot wo uld suddenly initiate an unexpected movement by pulling my hands. When I pushed them back in a different direction, they responded with something in another way. Now, I understand that novel patterns of the robot were more likely to be generated when my response conflicted with that of the robot. This was because the reaction forces generated between the robot’s hands and my hands were transformed into an error signal in the MTRNN model in the robot’s brain, and consequently its internal neural state was modified by means of the resultant error regression process. Such experiences, resulting from the enactment of such novel intentions, can be learned successively and can induce further modification of the memory structure in the robot brain. Intentions for a variety of novel actions can be generated again from such reconstructed memory st ructures. W hat I witnessed is il lustrated with a sketch shown in Figure 10.9 a.
(a)
(b) Re-structuring of memory
Conscious experience
Memory structure Spontaneous generation of novel intention
Unpredicted perception
Novel action
Environment/other agents
Figure 10.9. Circular causal ity. (a) Chain of circula r causality and ( b) its appearance by means of mutual prediction of future and regression of past between a robot and myself.
Free Will for Action and Conscious Awareness 239
This sketch depicts that there is a circular causality among (1) spontaneous generation of intentions with various proactive actional images developed from the memory structure, (2) enactment of those actional images in reality, (3) conscious experience of the outcome of the interaction, and (4) incremental learning of these new experiences and the resultant reconstruction in the memory structure. Here, an open dynamic structure emerges by way of the aforementioned circular causality. Consequently, diverse images, actions, and thoughts can be generated, accompanied by spontaneous shifts between conscious and unconscious states of mind after repeated confrontation and reconciliation between the subjective mind and the objective world. Furthermore, it is worth noting that the emergent processes described in Figure 10.9 a include also me as I insert myself into the circular causality in the robotics experiment described in this section (see Figure 10.9 b.) When I concentrated on tactile perception for the movement of the robot in my grasp, sometimes I noticed that my own next movement image popped out suddenly without my conscious control. I also noticed that tension between me and the robot rose up to critical level occasionally from where unexpected movement patterns of mine as well as of the robot burst out. Although I may be unable to articulate the mechanics behind such experience in greater detail t hrough unaided introspection, alone, I became sure that the interaction between the robot and me exhibited its “authentic” trajectory. Ultimately, free will or free action might be generated in a codependent manner between “me” and others who seek for the most possibility in the shared social situation in this world. At the same time, finally, I realized that I had conducted robotics experimental studies not only to evaluate the proposed cognitive models objectively, but also to enjoy myself, creating a rich subjective experience in the exploration of my own consciousness and free wil l through my online interaction with neurodynamic robots.
10.3. Summary This chapter tackled the problems of consciousness, intention, and free will through the analysis of neurorobotics experimental results. The problems we focused on were how free will for action can emerge and how it can become the content of consciousness. First, our study investigated how intention for different actions can be generated spontaneously.
240
Emergent Minds: Findings from Robotics Experi ments
It was found that actions can be shifted from one to another spontaneously when a chaotic attractor is developed in the slow dynamics subnetwork in the higher levels of the cognitive brain. This implies that intention for free act ion arises from fluctuating neural activity by means of determinist ic chaos in the higher cognitive b rain area. And thi s interpretation accords with experiment results as delivered by Libet (1985) and Soon and colleagues (2008). The next question tackled was why conscious awareness of the intention for generating spontaneous actions arises only with a delay immediately before actual action is initiated. For the purpose of considering this question, a robotics experiment simulating conflictive situations between two robots was performed. The experi ment used an extended version of the MTRNN model employing an error regression scheme for achieving online modification o f the i nternal neural activity i n the conflictive situation. The experimental results showed that spontaneously generated intention in the higher level subnetwork can be modified in a postdictive manner by using the prediction error generated by the conflict. It was speculated that one becomes consciously aware of one’s own intention for generating action only via postdiction, when the srcinally generated intention is modified in the face of conflicting perceptual reality. In the case of generating free actions, as in the experiment by Libet, the delayed awareness of one’s own intention can be explained similarly, as the conflict emerges between the higher level unconscious intention for initiating a particular movement and the lower level perceptual reality by embodiment, which results in generation of the prediction error. These considerations lead us to conjecture that there might be no space for free will because all phenomena including the spontaneous generation of intentions can be explained by causally deterministic dynamics. We enjoy, however, an experience of free will subjectively, because we feel as if freely chosen actions appear out of a clear sky in our minds without any cause, because our conscious mind cannot trace its secret development in unconscious process. Finally, the chapter examined the circular causality appearing among processes generating i ntention, embodiment of such intention in real ity, conscious experience of perceived outcomes, and successive learning of such experience in the robot– human interactive tutoring experiment. It was postulated that, because of this circular causality, all processes time- develop in a groundless manner (Varela, et al.,
Free Will for Action and Conscious Awareness 241
1991) without any convergence to particular situations, whereby images and actions are generated diversely. The vividness and the authenticity of our “selves” might appear especially at a certain criticality under such groundless situations developed through circular causality. And thus, our minds might become ultimately free only when gifted with such groundlessness.
11 Conclusions
Now, after completing descriptions of our robotics experiment outcomes, this final chapter presents some conclusions from reviewing these experiments.
11.1. Compositionality in the Cognitive Mind This book began with a quest for a solution to the symbol grounding problem by asking how robots can grasp meanings of the objective world from their subjective experiences such as the smell of cool air from a refrigerator or the feeling of one’s own body sinking back into a sofa. I considered that this problem srcinated from Cartesian dualism, wherein René Descartes suggested that the mi nd is a nonmaterial, think ing thing essentially dis tinct from the nonthinking, material body , only then to face the “problem of interactionism,” that is, expounding how nonmaterial minds can cause anything in material bodies, and vice versa. Actually, today’s symbol grounding problem addresses the same concern, asking how symbols considered as arbitrary shapes of tokens defined in nonmetric space could interact densely with sensor y– motor reality defined in physical and material metric space (Tani, 2014; Taniguchi et al., 2016).
243
244
Exploring Robotic Minds
In this book, I attempted to resolve this longstanding problem of mind and body by taking synthetic approaches. The book presents the experimental tr ials, inspired by Merleau -Ponty’s philosophy of embodiment, in which my colleagues and I have engineered self-organizing, nonlinear dynamic systems onto robotic platforms. Our central hypothesis has been that essential cognitive mechanisms self- organize in the form of neurodynamic structures via iterative learning of continuous flow of sensory– motor experience. This learning grounds higher level cognition in perceptual reality without suffering the disjunction between lower and higher level operations that is often found in hybrid models employing symbolic composition programs. Instead, iterative interactions between top- down, subjective, intentional processes of acting on the objective world and bottom-up recognition of perceptual reality result in the alteration of top- down intention through circular causality. Consequently, our models have successfully demonstrated what Merleau-Ponty described metaphorically as the reciprocal insertion and intertwining of the subject and the object through which those two become inseparable entities. It might be still difficult for proponents of cognitivism such as Chomsky to accept such a line of thought. As mentioned in chapter 2, the cognitivist’s first assumption is that an essential aspect of human cognition can be well accounted for in terms of logical symbol systems, the substantial strengt h of which being that they can suppo rt an i nfinite range of recursive expressions. The second assumption is that sensory– motor or semantic systems are not necessary for the composition or recursion taking place in terms of symbol systems, and therefore may not be essential components of any cognitive systems. However, one crucial question is whether or not it is necessary for the daily actions and thoughts of human being to be supported by such an infinite length of recur sive compositions, in the fir st place. In everyday situations, a human being speaks only with a limited depth of embedded sentences, and makes action plans composed of only a limited length of primitive behavior sequences at each level. An infinite depth of recursive composition is required in neither case. And, the series of robotics experiments described in this book confirm this characterization. Our multiple timescale recurrent neural networks (MTRNNs) can learn to imitate stochastic sequences via self– organizing deterministic chaos with complexity of finite state machines, but not with that of infinite ones. A mathematical study by Siegelmann (1995) and recently
Conclusions
245
by Graves and colleagues (2014) have proved the potential of analog computational models, including recurrent neural networks (RNNs) with external memory for writing and reading, that they can exhibit computational capabilities beyond the Turing limit. However, the construction of such Turing machines through learning is practically impossible, because the corresponding parameters such as connectivity weights can be found only in singular points in the weight space. Such a parameter- sensitive system may not function reliably, situated in the noisy, sen sory–motor reality that its practical embodiment may require, even if an equivalence to such a Turing machine might be constructed in an RNN by chance (Tani et al., 2014). This should be the same for ordinary human cognitive processes that rely on relatively poor working memory characterized by the magic number seven (Miller, 1956). My work with robots has attempted to model everyday analogical processes of ordinary humans generating behaviors and thoughts characterized by an everyday degree of compositionality. This scope may include the daily utterances of children, before the age of 5 or 6, who can compose sentences in their mother language without explicitly recognizing their syntactic structures, and also include the tacit learning of skilled actions such as the grasping of an object to pass it to others without thinking about it, or even making a cup of instant coffee. Our robotics experiments have demonstrated that selforganization of particular dynamical structures within dynamic neural network models can develop a finite level of compositionality, and that the contents of these compositions can remain naturally grounded in the ongoing flow of perceptual reality throughout this process. Of course, this is far from the end of the story. Even though we may have created an initial picture of what is happening in the mind, problems and questions remain. For example, a typical concern people often ask me about is whether symbols really don’t exist in the brain (Tani et al., 2014). On this count, many electrophysiological researchers have argued for the existence of so- called grandmother cells based on studies of animal brains in which local firi ngs are presumed to encode specific meanings in terms of a one- to-one mapping. These researchers argue that these grandmother cells might funct ion like symbols. A neuroph ysiologist once emphatically argued with me, denying the possibility of distributed representations, saying that “this recorded neuron encodes the action of reaching to pulling that object.” On the contrary, I thought it a possibility that this neuron could fire for generating other types of
246
Exploring Robotic Minds
actions that could not be observed in his experiment setting in which movements of the animals were quite constrained. Indeed, recent developments in multiple-cell recording techniques suggest that such mappings are more likely to be many-to-many than one-to-one. Mormann and colleagues’ (2008) results from multiple- cell recordings of the human medial temporal lobe revealed that the firing of cells for a particular concept is sparse (firing of around 1% cell population) and that each cell encodes from two to five different concepts (e.g., an actress’ face, an animal shape, and a mathematical formula). Even though concepts are represented sparsely, their representation is not one-to-one but distr ibuted, and so any presump tion that something like di rect symbolic representations exis t in the human brain seems equally to be in error. That aside, I speculate that we humans use discrete symbols outside of the brain depending on the situation. Human civilization has evolved through the use of outside-brain devices such as pens and paper to write down linguis tic symbols, thereb y dist ributing thought through symbolic representations, an aspect of what Clark and Chalmers (1998) have called “extended mind.” This use of external representation, moreover, may be internalized and employed through working memory like a “blackboard” in the brain to “write down” our thoughts when we don’t have pen or paper handy. In this book, my argument has been that our brain can facilitate everyday compositionality such as in casual conversation or even regular skilled action generation by combining primitive behaviors without needing to (fully) depend on symbol representation or manipulation in the outside-brain devices. Still, when we need to construct complicated plans for solving complex problems such as job scheduling for a group of people in a company or basic designing for building complex facilities or machines, we typically compose these plans into flow charts, schematic drawings, or itemized statements on paper or in other media utilizing symbols. Tasks at this level might be solved by cognitive architectures such as Act-R, GPS, or Soar. Indeed, these cognitive architectures are good at manipulating symbols as they exist outside of brains by utilizing explicit knowledge or rules. So, this poses the question of how these symbols outside of the brain can be “grounded” in the neurodynamic structures inside the brain. Actually, one of the srcinal inventors of Soar, John Laird, has recently investigated this problem by extending Soar ( Laird, 20 08). The extended Soar contains additional building blocks that are involved in the learning of tacit knowledge about perception and action generation
Conclusions
247
without using symbolic representation. Such subsymbolic levels are interfaced with s ymbolically represented short- term memory (STM) in next level. Next actions are determined by applying production rules to the memory contents in t he STM. Simi lar research trials can be seen elsewhere (Ritter et al., 2000; St A mant & Riedl, 2001; Bach, 2008). Ron Sun (2016) have developed a cognitive architecture, CLARION, which is characterized by interactions between explicit processes realized by symbol systems and implicit processes by the connectionist networks under the similar motivation. Although these trials are worth examining, I speculate that the introduction of symbolic representations in STM in Soar or in t he explicit level in CLAR ION might be too early, because such representations can be developed still in a nonsymbolic manner such as by analog neurodynamic patterns, as I have shown repeatedly in the current book. The essential questions would be from which level in cognitive process external symbols should be used and how such symbols can be interfaced with sub-symbolic representation. These questions are left for future studies, and there will undoubtedly be many more we will face.
11.2. Phenomenology The current book also explored phenomenological aspects of human mind including notions of self, consciousness, subjective time, and free will by drawing correspondences between the outcomes of neurorobotics experiments and some of the literature in traditional phenomenology. Although some may argue that such analysis from the synthetic modeling side can never be more than metaphorical, against this I would argue that models capture aspects essential to a phenomenon, reducing the complexity of a system to only these essential dimensions, and in this way models are not metaphors. They are the systems in question, only simpler, at least in so far as essential dimensions are indeed modeled and nothing more (see further discussion by Jeffrey White [2016].) In this spirit, I believe that interdisciplinary discussions on the outcomes of such neurorobotics experiments can serve to strengthen the insights for connecting aspects of robot and human behaviors more closely. It should be true that human phenomenology, human behavior, and underlying brain mechanisms can be understood only through their mutual constraints imposed on the formal dynamical models, as
248
Exploring Robotic Minds
Varela (1996) pointed out. In this way, robotics experiments of the sort reviewed in this text afford privileged insights into the human condition. To reinforce these insights, let us review these experiments briefly. In the robot navigation experiment described in section 7.2, it was argued that the “self” might come to conscious awareness when coherence between internal dynamics and environmental dynamics breaks down, when subjective anticipation and perceptual observation conflict. By referring to Heidegger’s example about a carpenter hitting nails with a hammer, it was explained that the subject (carpenter) and the object (hammer) form an enactive unity when all of the cognitive and behavioral processes proceed smoothly and automatically. This process is characterized by a steady phase of neurodynamic activity. In the unsteady phase, the distinction between these two becomes explicit, and the “self” comes to be noticed consciously. An important observation was that these two phases alternated intermittently by exhibiting the characteristics of self-organized criticality (Bak et al., 1987). It was considered that the authentic being might be accounted for by this dynamic structure. In section 8.4, I proposed that the problem of segmenting the continuous perceptual flow into meaningful reusable prim itive patterns might be related to the problem of time perception as formulated by Husserl. For the purpose of examini ng this t hought, we reviewed an experiment involving robot imitation learning that uses the RNNPB model. From the analysis of these experimental results, it was speculated that “nowness” is bounded where the flow of experience is segmented. When the continuous perceptual flow can be anticipated without generating error, there is no sense of events passing through time. However, when the prediction error is generated, the flow is segmented into chunks by means of a parametric bias vector modification with an effort for minimizing the error. With this, the passing of time comes to conscious awareness. The segmented chunks are no longer just parts of the flow, but rather represent discrete events that can be consciously identified according to the perceptual categories as encoded on our model by the PB vector. In fact, it is interesting to see that the observation of compositional actions by others accompanies the momentary consciousness at the moment of segmenting the perceptual flow into a patterned set of primitives. This is because compositional actions generated by others entail potential unpredictability when such actions are composed of primitive acts voluntarily selected by means of the “free will” of the others. Therefore, compositionality in cognition might be related to the
Conclusions
249
phenomenology of free will and consciousness. If some animals live only on sensory- reflex behaviors without the ability to either recognize or generate compositional actions, there might be no space for consciousness or for the experience of free will in their “minds.” In chapter 9, I wrote that the capability of abstraction through hierarchy in MTRNN can provide robots with competency of self-narrative for own actional intention in mental simulation. I speculated that reflective selves of robots may srcinate from this point. By following this argument, chapter 10 was devoted to the relationship between free will and conscious experience in greater depth. From results of robotics experiments utilizing the MTRNN model (section 10.1), I proposed that intentions for free actions could be generated spontaneously by deterministic chaos in the higher cognitive brain area. Results of the robotics experiment shown in section 10.2 suggest that conscious awareness of the intention developed by such deterministic dynamics can arise only in a postdictive manner when conflicts arise between top-down prediction and bottom-up reality. This observation was correlated with the account for the delayed awareness of free will reported by Libet (1985). By considering possible situations in which the intention to enact a particular movement generated in the higher level conflicts with the sensory–motor reality as constituted in the lower level, it was proposed that an effort autonomously mechanized for reducing the conflict would bring the intention to conscious awareness. Finally, this chapter suggested that there might be no space for free will from an objective view because all of the mechanisms necessary for generating voluntary actions can be explained by deterministic dynamics due to causal physical phenomena, as I have shown in our robotics experiments. Though it is true that in our everyday subjective experience we feel as if free will exists, through the results of our neurorobotics experiments we can see that this phenomenon may arise simply because our minds cannot see the causal processes at work in generating each intentional action. Our minds cannot observe the phase space trajectory of chaos developed in the higher cognitive brain area. We are conscious of each intention as if it pops up without any prior cause immediately before the corresponding action is enacted. On this account, thus, we may conclude that free will exists but merely as an aspect of our subjective experience. With the relationship between free will and consciousness thus clarified, I will reiterate once more that the problem of consciousness
250
Exploring Robotic Minds
may not be the hard problem after all. If consciousness is considered to be the first person awareness of embodied physical processes, then an exhaustive account of consciousness should likewise appear via the explanation of the relationships between the subjective and the objective. This stands to reason, of course, provided that the whole of this universe is also constituted by these two poles, and that nothing exists outside of them (something “supernatural”). When subjectivity is exemplified by the top- down pathway of predicting an actional outcome, and objectivity by the bottom- up recognition of the perceptual reality, these poles are differentiable in terms of the gap between them. Consequently, consciousness at each moment should appear as a sense of an effortful process aimed at minimizing this gap. Then, qualia might be a special case of conscious experience that appears when the gap is generated only in the lower perceptual level in which the vividness of qualia may be srcinated from the prediction error residual at each insta nce. Along this line, and more speci fically, Friston (2010) would say that it is from the error divided by the estimated variance (uncertainty) rather than the error itself. However, a more essential issue is to understand the underlying structure of consciousness rather than just a conscious state at a particular moment that is measured post hoc in terms of integrated information (Tononi, 2008), for example, or in terms of the aforementioned gap or prediction error. We have to explain the underlying structural mechanism accounting for, for example, the stream of consciousness formulated as spontaneous alternation between conscious state and unconscious state by William James (1892). The crucial proposal in the current book is that the circular causality developed between the subjective mind and the objective world is responsible for consciousness and also for an appearance of free will, as these two are dependent on each other within the same dynamic structure. The top-down proactive intention acting on the objective world induces changes on this world, whereas the bottom-up postdictive recognition of such changes including unexpected ones may induce changes in memory and intention in the subjective mind. This could result in another emergence of “free” action by means of the potential nonlinearity of the system. In the loop of circular causality, spontaneous shifts between unconscious state in terms of the coherent phase and conscious state in terms of the incoherent phase occur intermittently as the dynamic whole develops toward criticality.
Conclusions
251
To sum up, this open dynamic structure developed in the loop of the circular causality should account for the autonomy of consciousness and free will. Or, it can be said that this open dynamic structure explains the inseparable nature of the subjective mind and the objective world in terms of autonomous mechanisms moderating the breakdown and unification of this system of self and situation. Conclusively, criticality developed in this open, dynamic structure might account for the authenticity thought by Heidegger that generates trajectory toward own most possibility by avoiding just falling into habitual or conventional ways of acting (Tani, 2009). Reflective selves of robots that can examine own past and future possibility should srcinate from this perspective.
11.3. Objective Science and Subjective Experience The readers might have noticed that two different attitudes in conducting robotics experiments appear by turns in Part II of the current book. One type of my robotics experiment focuses more on how adequate action can be generated based on the learning of a rational model of the outer world, whereas the other type focuses more on the dynamic characteristics of possible interactions between t he subjective mind a nd the objective world. For example, chapter 7 employs these two different approaches in the study of robot navigation learning. Section 7.1 described how the RNN model used in mobile robots can develop compositional representations of the outer environment and how these representations can be grounded. On the other hand, section 7.2 explored characteristics of groundlessness (Varela et al., 1991) in terms of fluctuated interaction between the subjective mind and the objective world. Section 8.3 describes the one- way imitation learning of the robot to show that the RNNPB model can learn to generate and recognize a set of primitive behavior patterns by observing movements of its human partner. Afterward, I introduced the imitation game experiment in which twoway mutual imitation between robot and human was the focus. It was observed that some psychologically plausible phenomena such as turn taki ng of initiative emerg ed in t he course of the imitation game, reinforcing our emphasis on the interaction between the first- personal subjective and the objective, in this case social, world. In chapter 9, I described
252
Exploring Robotic Minds
how the MTRNN model can learn compositional action sequences by developing an adequate functional hierarchy in the network model. Then, chapter 10 examined how circular causality can be developed among different cognitive processes for the purpose of investigating the free wil l problem by using the same MTRN N model. This chapter also reported how novel image and action can be generated both in robot and human sides during interactive tutoring of robots by human tutors. To sum up, my research attitude has been shifting between one side of investigating rational models for cognitive mechanisms from an objective view and the other side of exploring subjective phenomena by means of putting myself inside the interaction loop in robotics experiments. Matsuno (1989) and Gunji (Gunji & Konno, 1991) wrote that the former type of research attitude would take a view of the so- called external observer and the latter of the so- called internal observer . They used the term observation as mostly equivalent to the term interaction . When the relationship between the observer and the observed can alter because of the interactions between them, such an observer is regarded as an internal observer because it is included in the internal loop of the interactions. On the other hand, the external observer assumes only one-way, passive observation from observed to observer without any interactive feedback. Observation, itself, consists in a set of embodied processes that are physically constrained in various ways such as by imprecision in perception and in motor generation, time delays in neural activation and body movement, limitation in memory capacity, and so on. Such physical constraints in time and space do not allow the system to be uniquely optimized and thus give rise to incompleteness and inconsistency. Actually, in our robot experiments, such inconsistencies arise in every aspect of cognitive processes including action generation, recognition of perceptual outcomes, and the learning of resultant new experience. However, at the moment of encountering such an inconsistency, the processes cannot be merely terminated. Instead, each process attempts to change its current relations as if it were expected that the inconsistency will be resolved sometime in the future and as long as the interaction continues (Gunji & Konno, 1991). We can experience something analogous to this when we go to a gig of “cutting- edge” contemporary jazz. A brilliant tenor sax player like the late Michael Brecker often started a tune with familiar phrases of improvisation in calm, but his play and other band members’ got tensed
Conclusions
253
gradually through mutual responses. At the near peak of the tension, likely to break down at any moment, his play sometimes got stuck for an instant as his body control for blowing or tonguing seemed unable to catch up with his rushed image any more. In the next moment, however, the unbelievable tension of sound and phrase burst out. His genuine creativity in such thri lling playing resulted no t merely from his outstanding skills for improvising phrases or for perfect control of the instrument but srcinated from the urgent struggle for enactment of his exploded mental image and intention. It is interesting to note that cognitive minds appear to maintain two processes moving toward opposite directions, one toward stability and the other toward instability. The goal directedness is considered as an attempt to achieve the stability of the system by resolving the currently observed inconsistencies of the system. All processes of recognition, generation, and learning can be regarded as goal-directed activities, which can be accounted for such as by the prediction error minimization principle employed in our models. These activities are geared toward grounding as shown in some of our robotics experiments. However such goal-directed attempts always entail instability because of their embodiment as well as potential openness of adopted environment that resulted in the groundlessness, as we have witnessed in our other robotics experiments. The coexistence of the stable and the unstable nature does not allow the system state to simply converge but imbues the system with autonomy for generating itinerant trajectory (Tsuda, 2001; Ikegami & Iizuka, 2007; Ikegami, 2013) wherein we can find the vividness of a living sys tem. By overviewing my research history, now I become sure that both research attitudes are equally important for the goal of understanding the mind via synthesis. On the one side, it is crucial to build rational models of cognition with the goal of optimization and stabilization of each elementary cognitive process. On the other hand, it is equally crucial to explore dynamic aspects of mind while the optimization is yet to be achieved during the ongoing process of robots acting in the world. The former research can be much more advanced by using the recent results from the booming research programs on machine learning and deep learning in which the connectionist approach with employing the error back propagation scheme has been revived by introducing more elegant mathematics to the models than those in 1980’s. For further advancement of the latter part, we need to explore the methodology
254
Exploring Robotic Minds
of art iculating the subjective experience of the experimenters who are within t he interaction loop in the robotics experiment. What we need to do is to enhance further the circular loop between the objective science of modeling cognitive mechanisms and the practice for articulating the subjective experience. This exactly follows what Varela and colleagues proposed in the embodied mind (Varela et al., 1991) and in their so-called neurophenomenology program (Varela, 1996). Varela and colleagues proposed to build a bridge between mind in science and mind in experience by articulating a dialogue between these two traditions of Western cognitive science and Buddhist meditative psychology (Varela et al., 1991, xviii). Why Buddhist meditation for the analysis of subjective experience? This is because the Buddhist tradition of meditation practice spanning more than 26 centuries has achieved systematic and pragmatic disciplines for accessing the human experience. Parts of Buddhist meditation disciplines could be applied directly to our problem of how to articulate the subjective experience of the experimenter in the robotics experiment loop. The Buddhist mindful awareness tradition starts with practices to suspend habitual attitudes granted in everyday life (Varela et al., 1991). By practicing this suspension of the habitual attitude, meditators become able to let their minds present themselves or go by themselves by developing a mood for stepping back. Analogously, if we attempt to develop ultimately natural, spontaneous mindful interactions between robots and human, we should get rid of arbitrary thinking in the human subjects, such as what robots or human should do or should not do, which have been assumed in the conventional human–robot interaction framework. In my own experience of interacting with the robot as described in section 10.2, when I was more absorbed in the robot interaction by concentrating on tactile perception for the movement of the robot in my grasp, I felt more vividness on the robot movement and also experienced more spontaneous arousal of kinesthetic image for my own movement. The ongoing interaction was neither dominated by my subjectivity nor the objectivity of the robot. It was like floating in the middle way between the two extremes of the subjectivity and the objectivity. Such intensive interaction alternated between a more tensed, conflictive phase and a more relaxed one, as I already mentioned. It is noted that continuance of such subtle interaction depended on how diverse memory patterns were consolidated by developing generalized deep structure in the dynamic neural network used in the robot. The more deeply the
Conclusions
255
memory structure develops, the more intriguing the generated images become. The enhancement of the employed models greatly contributes to realization of sensible interactions between the robots and the human subjects. In summary, it is highly expected that the goal of understanding the mind can be achieved by making efforts both in the objective science and the subjective experience, one for investigating more effective cognitive models assuring for better performance and scalability, and the other for practicing to achieve truly mindful interaction with the robots. True features of the mind should be captured by undertaking such research trials of moving back and forth in exploration of objective science and subjective experience.
11.4. Future Directions Although this book has not concentrated on modeling the biological reality of the brain in details recent exciting findings in system- level neuroscience draw me to explore this area of research more explicitly. The sizeable amount of human brain imaging data that have been gathered to date has enabled a global map to be created of both static connectivity and dynamic connectivity between all the different cortical areas (Sporns, 2010). Thanks to such data, now might be a good time to star t trying to reconstruct a global model o f the brain so that we can synthetically examine what sorts of brain functions appear locally and globally with both static and dynamic connectivity constraints. In the process, we may also examine how these models correspond with evidence from neuroscience. An exciting future task might be to build a large- scale brain network by using either rate- coding neural units or spiking neurons for artificial humanoid brains. Such experiments have been started already by some researchers, including Edelman’s group (Fleischer et al., 2007) and Eliasmith (2014) by introducing millions of spiking neurons in their models. I should emphasize, however, that large scale does not mean a complete replica of real brains. We still need a good abstraction of the biological reality to build tractable models. We may not need to reconstruct the whole brain by simulating activity of 100 billions of biological plausible neurons interconnected with columnar struct ure as like ai med by Blue Brain project (see section 5.4).
256
Exploring Robotic Minds
Interestingly, it has been recently shown that some connectionist type neural network models using several orders less number of rate-coding neural units can exhibit human level performance in specific tasks such as visual object recognition. It was shown that so-called the convolutional neural network (CNN, LeCun et al., 1998) developed as inspired by the hierarchical organization of the visual cortex can learn to classify visual images of hundreds object types such as bicycles, cars, chairs, tables, guitars and so on in diverse views and sizes with error rate of 0.0665 by using 1 million set of static visual image training data (Szegedy et al., 2015.) Although this classification accuracy is almost close to that of human (Szegedy et al., 2015), a surprising fact is that the used CNN consisting of 30 layers contains only around a million of almost homogeneous rate-coding neural units as opposed to the fact that the real visual cortex contains 10 billion of spiking neurons with hundreds of different morpho-electrical types (see section 5.4.) This implies that activities of 10 thousands of spiking neurons could be represented by that of a single rate coding neural unit as a point mass in connectionist models without degrading their performance level as I presumed in section 5.4. It is also inferred that the known diversity in cell types as well as in synaptic connection types can be regarded as biological details which may not contribute to primary system level understanding of brain mechanisms such as how visual objects can be classified in brains. Building a largescale brain network model consisting of a dozen of major brain areas in its subnetworks by allocating around 10 million of rate-coding neural units as the total may not be so difficult even in the current computational environment of using clusters. Now, we can start such an enterprise, referred to as Humanoid Brain project. Humanoid Brain project would clarify the underlying mechanism on the functional differentiation observed across local areas in our brains in terms of downward causation by the functional connectivity and the multiple spatio-temporal scales property evidenced in human brains and by embodiment in terms of structural coupling of the peripheral cortical areas with sensory-motor reality. Another line of meaningful extension in terms of neuro-phenomenologicalrobotics would be exploration of underlying mechanisms for various psychiatric diseases including schizophrenia, autism, and depression. Actually, I and my colleagues have started studies in this direction, which have already shown initial results. Yamashita and Tani (2012) proposed that disturbance of self, which is a major symptom in schizophrenia, can be explained as compensation for adaptive behavior by means of the
Conclusions
257
error regression. A neurorobotics model was built as inspired by the disconnectivity hypothesis by Friston (1998) that suggests that basic pathology of schizophrenia may be associated with functional disconnectivity in the hierarchical network of the brain (i.e., between prefrontal and posterior brain regions). In the neurorobotics experiment (Yamashita and Tani, 2012) using an MTRNN model, a humanoid robot was trained for a set of behavioral tasks. After the training, a certain amount of perturbation was given in the connectivity weights between the higher level and the lower level to represent the disconnectivity. When the robot performed the trained tasks with online error regression, the inner prediction error was generated because of the disconnectivity introduced. Consequently, the intention state in the higher level was modulated autonomously by the error signal back-propagated from the lower perception level. This observation suggests that aberrant modulatory signals induced by internally generated prediction error might be a source of the patient’s feeling that his intention is affected by some outside force. Furthermore, the experimental result by Yamashita and Tani (2012) suggests a hypothetical account for a schizophrenia symptom, cognitive fragmentation (Perry & Braff, 1994), in which the patients lack continuity in spatiotemporal perception. It is speculated that such cognitive fragmentation might be caused by frequent occurrences of the inner prediction error, because subjective experience of time passing can be considered to be associated with prediction error in segmentation points in perceptual flow, as I have analyzed in section 8.3. In future research, the mechanism for autism could be clarified in terms of another type of malfunction in the predictive coding scheme presumed in the brain. Recently, Van de Cruys and colleagues (Van de Cruys et al., 2014) proposed that hyper-prior with less tolerance with the prediction error results in failure in generalization in learning which is the primary cause of autism. This can be intuitively explained that the prediction network can generate overfitting problem with generalization error when the top-down pressure for minimizing the error in learning is imposed on the network too strongly. This generalization error in predicting coming perceptual state could be considered as the main cause of autism from accumulated evidence on the patients’ typical symptom that they are significantly good at learning by rote but lacking capability in structural learning (Van de Cruys et al., 2014; Nagai & Asada, 2015.) Robotic experiment for reconstructing the symptom could be conducted by modeling hyper-prior by implementing estimation of inverse precision used
258
Exploring Robotic Minds
in Bayesian predictive coding framework (Friston, 2005; Murata et al., 2015), as Van de Cruys and colleagues (2014) rationalized that over estimation of the precision under noisy real world circumstance can result in overfitting of the prediction model. Future studies should examine other psychiatric diseases including attention deficit hyperactivity disorder and obsessive–compulsive disorder. In summary, if a particular neurorobotics model represents a good model of the human mind, it should be able to account also for the underlying mechanisms for these common psychiatry pathologies, because brain structures of these patients are known to be not so much different from the normal ones. Another crucial question should be how much we can scale the neurorobots described in this book, as I know well that still my robots can work only in toy environments. Although I’d say that the progress made in neurorobotics has thus far been steady, actually scaling robots to nearhuman level might be very difficult. Confronted with this challenge, recent pragmatic studies of deep learning (Hinton et al., 2006; Bengio et al., 2013) have revived aging connectionist approaches supercharged with huge computational power latent within (multiple) graphic processing units in standard desktop PCs. Already, some deep learning schemes have recently demonstrated significant advances in perception and recognition capabilities by using millions of exemplar datasets for learning. For example, a convolutional neural network (LeCun et al., 1998) can perform visual object classification with near human level performance by learning (Szegedy et al., 2015) as described previously in this subsection, and a speech recognition system provided a far better recognition rate given noisy speech signals of unspecified speakers than widely used, state-of-the-art commercial speech recognition systems (Hannun et al., 2014). The handwriting recognition system using longterm shortterm memory by Doetsch and colleagues (2014) demonstrated its almost human-equivalent recognition performance. Such promising results seem to just ify some optimism, t hat the artificial upscaling to human-like cognitive capabilities using these methods may not be so difficult. Optimists may say that these systems can exhibit near human-level perceptual capabilities. Although this should be true for recognition of a single modality of perceptional channel, it is clear that deep understanding of the world on a human level cannot be achieved just by this. Such understanding should require associative integration among multiple modalities of perceptual flows, experienced through iterative interactions of the agents with the world.
Conclusions
259
Regardless, these and other recent advances in deep learning suggest that neurorobotics studies could be scaled significantly with the aforementioned large-scale brain network model if massive training libraries are used alongside multimodal, high- dimensional perceptual flow including the pixel level visual stream, like the one shown by Hwang et al. (2015) briefly described in section 9.3, and tactile sensation via hundreds of thousands of points of contact covering an entire “skin” surface; likewise for auditory signals, olfactory “organs” and so on. So empowered, with online experience actively associated with its own intentional interaction with the world, deep minds near human level might appear as a consequence. When a robot becomes able to develop subjective, proactive self-images in huge numbers of dimensions alongside its own unique “real-time” perceptual flow as it interacts with the world, we may approach the reconstruction of real human minds! Attempts to scale neurorobots toward human-like being, of course, are scientifically fascinating, and the developmental robotics community has already begun investigating this issue seriously (Kuniyoshi & Sangawa, 2006; Oudeyer et al., 2007; Asada et al., 2009; Metta et al., 2010, Asada, 2014; Cangelosi & Schlesinger, 2015; Ugur et al., 2015.) However, what is crucially missing from current models is general intelligence by way of which various tasks across different domains can be completed by adaptively combining available cognitive resources through functions such as inference, induction, inhibition of habituation, imitation, improvisation, simulation, working memory retrieval, and planning, among many others. One amazing aspect of human competency is that we can perform such a wide variety of tasks like navigating, dancing, designing intricate structures, cleaning rooms, talk ing with others, painting pictures, deliberating over mathematical equations, and searching the Internet for information on neurorobotics, simply to name a few. Compared with this, what our robots can do is merely navigate a given workspace or manipulate simple objects. So, taking our work one stage further logically involves educating robots to perform multiple domain tasks toward multiple goals with increasing degrees of complexity. Success in this endeavor should lead to a more general intelligence. Toward this end, the crucial question becomes how to increase the amount of learning. This is not easy, however, because we cannot train robots simply by connecting them to the Internet or to a database. Robots must act on the physical environment to acquire their own experiences. So, researchers must provide a certain developmental
260
Exploring Robotic Minds
educational environment wherein robots can be tutored every day for months or possibly for years. And, as robots must be educated within various task domains, this environment is necessarily more complex than a long series of still photos. In considering developmental education of robots, an essential question, still remained is that how human or artifacts like robots can acquire structural representation of the world by learning through experience under the constraints of “poverty of stimulus”, as Norm Chomsky (1972) once asked. This is asking how generalization in learning can be achieved, for example, in robots with limited amount of tutoring experiences. For this question developmental robotics could provide a possible solution by using the concept of staged development considered by Piaget (1951). The expectation is that learning in one developmental stage can provide “prior” to the one in the next stage by which dimensionality of the learning can be drastically reduced, and therefore generalization with less amount of tutoring experience becomes possible. Based on this conception, developmental stage would proceed from physically embodiment level to more symbolic level. Trials should require a lengthy period wherein physical interactions between robots and tutors involve “scaffolding”—guiding support provided by tutors that enables the bootstrapping of cognitive and social skills required in the next stage (Metta et al., 2010). With scaffolding, higher level functions are entrained alongside foundational perceptual abilities during tutoring, and the robot’s cognitive capacities develop from grounding simple sensory-motor skills to more complex compositional cognitive ones. It could happen that the earlier stages may require merely sensory–motor level interaction with environment physically guided by tutors whereas the later stages may provide tutoring more in demonstration and imitation style without introducing physical guidance. The very final stage of education may require only usage of virtual environments (like learning from watching videos) or symbolically represented materials (like reading books). For implementation of such staged tutoring and development of robots, research on the method for the tutor or educator side may become equally important. In the aforementioned developmental tutoring process, a robot should not be a passive learner. Rather, it should be an active learner that acts “creatively” for exploring the world, not merely repeating acquired skills or habits. For this purpose, robots should become authentic beings, as I mentioned repeatedly, by reflecting own past seriously and also by acting proactively for own most possibility that is shared with the tutors.
Conclusions
261
Tutoring interaction between such active learner robots and human tutors should inevitably become highly intensive occasionally. To carry out long-term and sometime intensive educational interactions, the development of emotions within the robot would be an indispensable aid. Although this issue has been neglected in this book, to take care of robots like children, human tutors would require emotional responses from the robots. Otherwise, many human tutors may not be able to continue cordial interactions with stone-cold, nonliving machines for such long periods. The development of adequate emotional responses should deepen bonds between tutors and robots, by which long-term, affectively reinforced education would become possible. Minoru Asada proposed socalled affective developmental robotics (Asada, 2015) in which he assumes multiple stages of emotional development from a simple stage to a complex one including emotional contagion, emotional empathy, cognitive empathy, and sympathy. His crucial premise is that the development of emotion and that of embodied social interaction are codependent on each other. Consequently, the long-term educational processes of robots by human caregivers should be accompanied by these two codependent channels of development. Finally, a difficult but important problem to be considered is whether artifacts can embody and express mor al vir tue. Aris totle says that moral virtues are not innate, but they can be acquired through habitual practice. It is said that an individ ual becomes truthful by acting t ruthful ly or becomes unselfish by acting unselfishly. Simultaneously, human beings are motivated to do something “good” for others because they share in the consequences of their actions by means of mirror neurons. The net effect is that, as one human seeks happiness for him- or herself, he or she experiences happiness in bringing happiness to others similarly embodied. In principle, robots can do the same by learning the effects of their own actions on the happiness expressed by others and reinforced through mirroring neural models. I would like to prove that robots can be developed or educated to acquire not only sophisticated cognitive competency but also moral virtue. Nowadays, robots may start to have “free will,” as I have postulated in this book. This means that those robots could happen to generate bad behaviors toward others as well by own wills. However, if the robots can learn about moral virtue, such robots would generate only good behaviors by inhibiting themselves to generate bad behaviors. Such robots would contribute to true happiness in a future human– robot coexisting society.
262
Exploring Robotic Minds
11.5. Summary This final section overviews the whole book once again for the purpose of providing final conclusive remarks. This book sought to account for the subjective experience characterized on the one hand by compositionality of higher-order cognition and on the other hand by fluid and spontaneous interaction with the outer world through the examination of synthetic neurorobotics experiments conducted by the author. In essence, this is to inquire into the essential, dynamical nature of the mind. T he book was organized into two parts, namely “Part I— On the Mind” and “Part II— Emergent Minds: Findings from Robotics Experiments.” In Part I, the book reviewed how different questions about minds have been explored in different research fields, including cognitive science, phenomenology, brain science, psychology, and synthetic modelling. P art I I star ted with new proposals for tackling open problems through neurorobotics experiments. We once again look at each chapter briefly to summarize them. Part I started with an introduction to cognitivism, in chapter 2 emphasizing “compositionality,” considered to be a uniquely human competency whereby knowledge of the world is represented by utilizing symbols. Some representative cognitive models were introduced that address the issues of problem solving in problem spaces and the abstraction of information by using “chunking” and hierarchy. This chapter suggested, however, the potential dif ficulty in utilizi ng symbols internal to the mechanics of minds, especially in an attempt to ground symbols in real-time, online, sensory- motor reality and context. Chapter 3 on phenomenology introduced views on the mind from the other extreme, emphasizing direct or pure experiences prior to being articulated with particular knowledge or symbols. The chapter covered the ideas of subjective time by Husserl, being-in-the-world by Heidegger, embodiment by Merleau-Ponty, and stream of consciousness by James. By emphasizing the cycle of perception and action in the physical world via embodiment, we explored how philosophers have tackled the problem of the inseparable complex that is the subjective mind and the objective world. It was also shown that notions of consciousness and free will may be clarified through phenomenological analysis. Chapter 4 attempted to explain how human brains can support cognitive mechanisms through a review of current knowledge in the field of neuroscience. To start with, we looked at a possible hierarchy in brains
Conclusions
263
that supports complex visual recognition and action generation. We then considered the possibility that two cognitive functions—generating actions and recognizing perceptual reality—are just two sides of the same coin by reviewing empirical studies on the mirror neurons and the parietal cortices. This chapter also examined the issue of the srcin of free will by reviewing the experimental study conducted by Libet (1985). Despite the recent accumulation of various experimental findings in neuroscience, these chapters concluded that it is not yet possible to grasp complete understanding of the neuronal mechanisms accounting for cognitive functions of our interests due to conflicting evidence and the limitations inherent in experimental observation in neuroscience. Chapter 5 introduced the dynamical systems approach for modeling embodied cognition both in natural and artificial systems. The chapter began with a tutorial on nonlinear dynamical s ystems. By following this t utorial, the chapter described Gibsonian and Neo- Gibsonian ideas in psychology that fit quite well with the dynamical systems framework and also explained how they have influenced the communities of behavior-based robotics and neurorobotics. Some representative neurorobotics studies were introduced investigating how primitive behaviors can develop and be explained from the dynamical systems perspective. Chapter 6, as the first chapter of Part II, proposed new paradigms for understanding cognitive minds by taking a synthetic approach utilizing neurorobotics experiments. First, the chapter postulated the potential difficulty in clarifying the essence of minds by just pursuing the bottom- up pathway emphasized by the behaviour-based approach. Then it was argued that what is missing are the top- down subjective intentions for acting on the objective world and its iterative interaction with the bottom- up perceptual reality. It was speculated that humanlike capabilities for dealing with compositional language-thoughts or even for much simpler cognitive schemes should emerge as the results of iterative interactions between these two pathways, top to bottom and bottom to top, rather than just by one-way processes along the bottomup pathway. It was furthermore speculated that a key to solving the so-called hard problem of consciousness and free will could be found on close examination of such interactions. Based on the thoughts described in chapter 6, new challenges discussed in chapters 7 through 10 concerned the reconstruction of various cognitive or psychological behaviours in a set of synthetic neurorobotics experiments. In these robotics studies, our research focus went
264
Exploring Robotic Minds
back and forth between two fundamental issues. On the one hand, we explored how compositionality for cognition can be developed via iterative sensory– motor level interactions of agents with their environments and how these compositional representations can be grounded. On the other hand, we also examined the codependent relationship between the subjective mind and the objective world that emerges in their dense interaction for the purpose of investigating the underlying structure of consciousness and free wi ll. In the first half of chapter 7, we investigated the development of compositionality by reviewing a robotics experiment on predictive navigation learning using a simple RNN model. The experimental results showed that the compositionality hidden in the topological trajectory in the obstacle environment can be extracted as embedded in a global attractor with fractal structure in the phase space of the RNN model. It was shown that compositional representation developed in the RNN can be naturally grounded in the physical environment by allowing iterative interactions between the two in a shared metric space. In the second half of chapter 7, on the other hand, we explored a sense of groundlessness (a sense of not to be grounded completely) through the analysis of another navigation experiment. It was shown that the developmental learning process during the exploration switched spontaneously between coherent phases and incoherent phases when chain reactions took place among different cognitive processes of recognition, prediction, perception, learning, and acting. By referring to Heidegger’s example about a carpenter hitting nails with a hammer, it was explained that the distinction between the two poles of the subjective mind and the objective world become explicit in the breakdown, as shown in the incoherent phase whereby the “self” rises to conscious awareness. We drew the conclusion that the open dynamic structure characterized by self-organized criticality (SOC) can account for the underlying structure of consciousness by way of which the “momentary self” appears spontaneously. Chapter 8 introduced the RNNPB as a model of mirror neurons that have been considered to be crucially responsible for the composition and decomposition of actions. The RNNPB can learn a set of behavior primitives for generation as well as for recognition by means of error minimization in a predictive coding framework. The RNNPB model was evaluated th rough a set of robotics experiments including learning of multiple movement patterns, imitation game, and associative learning of protolanguage and action whereby the following characteristics
Conclusions
265
emerged. (1) The model can recognize aspects of a continuous perceptual flow by segmenting it into a sequence of chunks or reusable primitives; (2) a set of actional concepts can be learned with generalization by developing relational structures among those concepts in the neural activation space, as shown in the experiment on associative learning between protolanguage and actions; and (3) the model can generate not only learned behavior patterns but also novel ones by means of twists or dimples generated in the manifold of the RNNPB due to the potential nonlinearity of t he network. Chapter 9 addressed the issue of hierarchy in cognitive systems. For this purpose, we proposed a dynamic model, the MTRNN that is characterized by its multiple timescale and examined how a functional hierarchy for action can be developed in the model through robotics experiments employing this model. Results showed that a set of behavior primitives were developed in the fast timescale network in the lower level, while the whole action plan that sequences the behavior primitives was developed in the slow timescale network in the higher level. It was also found that the initial neural activation state in the slow timescale network encoded the top-down actional intention that triggers the generation of a corresponding slow dynamics trajectory in the higher level, which again triggers the projection of an intended sequence of behavior primitives from the lower level of the network to the outer world. It was concluded that a sort of “fluid compositionality” for smooth and flexible generation of actions was achieved in the proposed MTRNN model through the self-organization of a functional hierarchy by adopting neuroscientifically plausible constraints including timescale differences among different local networks and structural connectivity among them as downward causation. Chapter 10 also considered two problems about free will. One involved its srcin and the other the conscious awareness of it. From the results of experiments employing the MTRNN model, I proposed that actional intention can be spontaneously generated by means of chaos in the higher cognitive brain areas. It was postulated that intention or will developed unconsciously in the higher cognitive brain by chaos would only come to conscious awareness in a postdictive manner. More specifically, when a gap emerged between the top-down intention for acting and the bottom- up perception of reality, the intention may be noticed as the effort of minimizi ng this gap is exercised. Furthermore, the chapter examined the circular causality developed among different cognitive processes in human–robot interactive tutoring
266
Exploring Robotic Minds
experiments. It was conjectured that free will could exist in the subjective experience of the human experimenter as well as the robot who seeks their own most possibility in their conflictive interaction when they feel as if whatever creative image for next act could pop out freely in their minds. The robot as well as human at such moments could be regarded as authentic beings. Finally, some concluding remarks are shown. The argument presented here leads to: 1. The mind should emerge via intricate interactions between the top-down subjective view for proactively acting on the external world and the bottom-up recognition of the perceptual reality. 2. Structures and functions constituting mechanisms driving higher- order cognition, such as for compositional manipulations of symbols, concepts, or linguistic thoughts may develop by means of the self-organization of neurodynamic structures through the aforemention ed topdown and bottom-up interactions, aiming at the reduction of any apparent conflict between these two processing streams. It is presumed that such a compositional cognitive process embedded in neurodynamic attractors could be naturally grounded into the physical world, provided they share the same metric space for interaction. 3. Image or kno wledge can be develo ped through multiple stages of learning fr om an agen t’s limited experiences— first s tage: each instance of experience is acquired; second stage: generalized images or concepts are developed by extracting relational structures among the acquired inst ances in the memory; th ird stage: novel or creative str uctures can be found in the memory developed with nonlinearit y. Such a developmental process should ta ke place in a large network consisti ng of the PFC, the parietal cortex, and the sensory– motor periph eral area s that are assumed to be t he neocortical target of t he consolidative learning in human or mammals. 4. However, the most crucial aspect of minds is the sense of groundlessness that ari ses by circular causality, understood in the end as t he inseparability of subjectivity and t he objective
Conclusions
world. This understanding could shed light on the hard problem of consciousness and its relationship to the problem of free wil l through uni fication of theoretical studies on SOC of the holistic dynamics evolved and Heidegger’s thoughts on authenticity. 5. The exploration of cognitive minds should continue with close dialogue between objective science and subjective experience (as suggested by Varela and others) for which synthetic approaches including cognitive, developmental, or neuronal robotics could contribute by providing effective research platforms.
267
Glossary for Abbreviations
BPTT CPG CTRNN DOF
back-propagation through time central pattern generator continuous- time recurrent neural network degree of freedom
EEG fMRI IPL LGN LIP LSBN LSTM LRP M1 MIST MSTNN
electroencephalography functional magnetic resonance imaging inferior parietal lobe lateral geniculate nucleus lateral intraparietal area largescale brain network long-term short- term memory lateralized readiness potential primary motor cortex medial superior temporal area multiple spatiotemporal neural network
MT MTR NN PB PC PCA PFC PMC PMv RNN
middle temporal multiple timescalearea recurrent neural network parametric biases parietal cortex principal component analysis prefrontal cortex premotor cortex ventral premotor area recurrent neural network
269
270
Glossary for Abbreviations
RNNPB RP SMA SOC STS TEO TPJ
recurrent neural network with parametric biases readiness potential supplementary motor area self-organized criticality superior temporal sulcus inferior temporal area temporoparietal junction
V1 VIP VP
primary visual cortex ventral intraparietal area visuo- proprioceptive
References
Aihara, K., Takabe, T., & Toyoda, M. (1990). Chaotic neural networks. Physics Letters A , 144, 333–340. Aristotle. (1907). De anima (R. D. Hicks, Trans.). Oxford: Oxford University Press. St Amant, R., & Riedl, M. O. (2001). A perception/action substrate for cognitive modeling in HCI. International Journal of Human- Computer Studies , 55(1), 15–39. Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers , 3, 299– 307. Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Andry, P., Gaussier, P., Moga, S., Banquet, J. P., & Nadel, J. (2001). Learning and communication via imitation: An autonomous robot perspective. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 31(5), 431–442. Arbib, M. A. (1981). Perceptual structures and distributed motor control. In V. B. Brooks (Ed.), Handbook of physiology: The nervous system. II. Motor control (pp. 1448– 1480). Bethesda, MD: American Physiological Society. Arbib, M. (2010). Mirror system activity for action and language is embedded in the integration of dorsal and ventral pathways. Brain & Language, 112, 12–24. Arbib, M. (2012). How the brain got language: The mirror system hypothesis . New York: Oxford University Press. Arie, H., Endo, T., Arakaki, T., Sugano, S., & Tani, J. (2009). Creating novel goal-directed actions at criticality: A neuro- robotic experiment. New Mathematics and Natural Computation, 5(01), 307–334. 271
272
References
Arnold, L. (1995). Random dynamical systems . Berlin: Springer. Asada, M., Hosoda K., Kuniyoshi, Y., Ishiguro, H., Inui, T., Yoshikawa, Y., Ogino, M. & Yoshida, C. (200 9). Cognitive developmental robotics: A survey. IEEE Transactions on Autonomous Mental Development , 1(1), 12–34. Asada, M. (2015). Towards artificial empathy. How can artificial empathy follow the developmental pathway of natural empathy?. International Journal of Social Robotics , 7(1), 19–33. Bach, J. (200 8). Principles of s ynthetic intelligence : Building blocks for an architecture of motivated cognition . New York: Oxford University Press. Bach, K., (1987). Thought and reference. Oxford: Oxford University Press. Badre, D., D’Esposito, M. (200 9). Is the rost ro- caudal axis of the frontal lobe hierarchical? Nature Reviews Neuroscience , 10, 659– 669. Bak, P., Tang, C., & Wiesenfeld, K. (1987). Self-organized criticality: An explanation of the 1/f noise. Physical Review Letters , 59, 381–384. Baldwin, D., Andersson, A., Saffran, J., & Meyer, M. (2008). Segmenting dynamic human action via statistical st ructure. Cognition , 106, 1382–1407. Balslev, D., Nielsen, F. A., Paulson, O. B., & Law, I. (2005). Right temporoparietal cortex activation d uring visuo- proprioceptive conflict. Cerebral Cortex , 15(2), 166–169. Baraglia, J., Nagai, Y., and Asada, M. (in press). Emergence of altruistic behavior through the minimization of prediction error. IEEE Transactions on Cognitive and Developmental Systems. Bassett, D. S., & Gazzaniga, M. S. (2011). Understanding complexity in the human brain. Trends in cognitive sciences, 15(5), 200 –209. Beer, R. D. (1995a). On the dynamics of small continuous-time recurrent neural networks. Adaptive Behavior, 3(4), 471–511. Beer, R. D. (1995b). A dynamical systems perspective on agent-environment interaction. Artificial Intelligence, 72 (1), 73–215. Beer, R. D. (2000). Dynamical approaches to cognitive science. Trends in Cognitive Scie nces, 4 (3), 91–99. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 35(8), 1798–1828. Billard, A. ( 200 0). Learning moto r ski lls by imitation: A biol ogically inspired robotic model. Cybernetics and Systems , 32, 155–193. Blakemore, S- J., & Sirigu, A. (2003). Action prediction in the cerebellum and in the parietal cortex. Experimental Brain Research , 153(2), 239– 245. Bor, D., & Seth, A. K. (2012). Consciousness and the prefrontal parietal network: Insights from attention, working memory, and chunking. Frontiers in Psychology , 3, 63. Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology . Cambridge, MA: MI T Press.
References 273 Brooks, R. A. (1990). Elephants don’t play chess. Robotics and Autonomous Systems , 6, 3–15. Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence Journal, 47, 139–159. Campbell, D. T. (1974). “Downward causation” in hierarchically organized biological systems. In Studies in the Philosophy of Biology (pp. 179–186). Macmillan Education UK. Cangelosi, A., & Schlesinger, robots. Cambridge, MA: MIM. T (2015). Press. Developmental robotics from babies to Chalmers, D. J. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies , 2(3), 200 –219. Chomsky, N. (1972). Language and mind . New York: Harcourt Brace Jovanovich. Choi, M. & Tani, J. (2016). Predictive coding for dynamic vision : Development of functional hierarchy in a multiple spatio-temporal scales RNN model. arXiv.org preprint arXiv:1606.01672 Chomsky, N. (1980). Rules and representations . Oxford: Basil Blackwell. Churchland, M. M., Yu, B. M., Cunningham, J. P., Sugrue, L. P., Cohen, M. R., Corrado, G. S., Newsome, W. T., Clark, A. M., Hosseini, P., Scott, B. B., Bradley, D. C., Smith, M. A., Kohn, A., Movshon, J. A., Armstrong, K. M., Moore, T., Chang, S. W., Snyder, L. H., Lisberger, S. G., Priebe, N. J., Finn, I. M., Ferster, D., Ryu, S. I., Santhanam, G., Sahani, M. & Shenoy, K. V. (2010). Stimulus onset quenches neural variability: a widespread cortical phenomenon. Nature Neuroscience, 13(3), 369–378. Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Nuyujukian, P., Foster, J. D., Ryu, S. I., & Shenoy, K. V. (2012). Structure of neural population dynamics during reachi ng. Nature, 487, 51–56. Clark, A. (1998). Being there: Putting brain, body, and world together again . Cambridge, MA: MI T Press. Clark, A., & Chalmers, D. (1998). The extended mind. Analysis, 58 (1), 7–19. Clark, A. (1999). An embodied cognitive science?. Trends in Cognitive Sciences, 3(9), 345– 351. Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. New York: Oxford University Press. Cleeremans, A., Servan- Schreiber, D., & McClelland, J. L. (1989). Finite state automata and simple recurrent networks. Neural Computation , 1, 372–381. Cliff, D., Husbands, P., & Harvey, I. (1993). Explorations in evolutionary robotics. Adaptive Behavior, 2(1), 73–110. Crutchfield, J. P. & Young, K. (1989). Inferring statistical complexity. Physical Review Letters , 63, 105–108. Dale, R., & Spivey, M. J. (2005). From apples and oranges to symbolic dynamics: A framework for conciliating notions of cognitive representation. Journal of Experimental & Theoretical Artific ial Intelligence , 17(4), 317–342.
274
References
Delcomyn, F. (1980). Neural basis of rhythmic behavior in animals. Science , 210, 492–498. Demiris, Y., & Hayes, G. (2002). Imitation as a dual-route process featuring predictive and learning components: A biologically plausible computational model. In K. Dautenhahn & C.L. Nehaniv (Eds.), Imitation in animals and artifacts (pp. 327–361). Cambridge, MA: MI T Press. Dennett, D. (1993). Review of F. Varela, E. Thompson and E. Rosch (Eds.), The embodied American Journal of Psychology , 106allows , 121–126. Desmurget, M., &mind. Grafton, S. (2000). Forward modeling feedback control for fast reaching movements. Trends in Cognitive Sciences, 4(11), 423– 431. Desmurget, M., Reilly, K. T., Richard, N., Szathmari, A., Mottolese, C., & Sirigu, A. ( 200 9). Movement intention after par ietal cortex st imulation in humans. Science , 324, 811–813. Devaney, R. L. (1989). An introduction to chaotic dynamical systems (Vol. 6). Reading, MA: Addi son-Wesley. Diamond, A. (1991). Neuropsychological insights into the meaning of object concept development. In S. Carey, R. Gelman (Eds.), The epigenesis of mind: Essays on biology and k nowledge (pp. 67–110). Hillsdale, NJ: Erlbaum. Di Paolo, E. A. (2000). Behavioral coordination, structural congruence and entrainment in a simulation of acoustically coupled agents. Adaptive Behavior, 8 (1), 27–48. Doetsch, P., Kozielski, M., & Ney, H. (2014). Fast and robust training of recurrent neural networks for offline handwriting recognition. In IEEE 14th International Conference on Frontiers in Handwriting Recognition (ICFHR) (pp. 279–284). Downar, J., Crawley, A. P., Mikulis, D.J., & Davis, K.D. (2000). A multimodal cortical network for the detection of changes in the sensory environment. Nature Neuroscience , 3(3), 277–283. Doya, K., & Uchibe, E. (2005). The cyber rodent project: Exploration of adaptive mechanisms for self-preservation and self-reproduction. Adaptive Behavior, 13(2), 149–160. Doya, K., & Yoshizawa, S. (1989). Memorizing oscillatory patterns in the analog neuron network. Proceedings of the 1989 International Joint Conference on Neural Networks, I, 27–32. Dreyfus, H. L., & Dreyfus, S. E. (1988). Making a mind versus modeling the brain: artificial intelligence back at a branch point. Daedalus , 117(1), 15– 43. Dreyfus, H. L. (1991). Being-in-the-world: A commentary on Heidegger’s Being and Time. Cambridge, MA: MI T Press. Du, J., & Poo, M. (2004). Rapid BDNF-induced retrograde synaptic modification in a developing retinotectal system. Nature, 429, 878– 883.
References 275 Eagleman, D. M., & Sejnowski, T. J. (2000). Motion integration and postdiction in visu al awareness. Science, 287(5460), 2036– 2038. Edelman, G. M. (1987). Neural Darwinism: Th e theory of neuronal group selec tion . New York: Basic Books, Inc. Ehrsson, H., Fagergren, A., Johansson, R., & Forssberg, H. (2003). Evidence for the involvement of the posterior parietal cortex in coordination of fingertip forces for grasp stability in manipulation. Journal of Neurophysiology ,
90, 2978–2986. Eliasmith, C. (2014). How to build a brain: A neural architecture for biological cognition . New York: Oxford University Press. Elman, J. L. (1990). Fin ding struc ture in time. Cognitive Science , 14, 179–211. Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning , 7(2–3), 195–225. Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G. (2008). Learning CPG- based biped locomotion with a policy gradient method: Application to a humanoid robot. The International Journal of Robotics Research , 27(2), 213–228. Eskandar, E., & Assad, J. (1999). Dissociation of visual, motor and predictive signals in parietal cortex duri ng visual guidance. Nature Neuroscience , 2, 88– 93. Evans, G. (1982). The varieties of reference . Oxford: Clarendon Press. Fitzsimonds, R., Song, H., & Poo, M. (1997). Propagation of activity dependent synaptic depression in simple neural networks. Nature, 388, 439– 448. Fleischer, J., Gally, J., Edelman, J., & Krichmar, J. (2007). Retrospective and prospective responses arising in a modeled hippocampus during maze navigation by a brain-based device. Proceedings of the National Academy of Sciences of the USA , 104 (9), 3556–3561. Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture: A critique. Cognition , 28, 3–71. Fogassi, L., Ferrari, P., Gesierich, B., Rozzi, S., Chersi, F., & Rizzolatti, G. (2005). Parietal lobe: from action organization to intention understanding. Science , 308, 662– 667. Freeman, W. (2000). How brains make up their minds? New York: Columbia University Press. Fried, I., Katz, A., McCarthy, G., Sass, K. J., Williamson, P., Spencer, S. S. & Spencer, D. D. (1991). Functional organization of human supplementary motor cortex studied by electrical stimulation. Journal of Neuroscience , 11, 3656– 3666. Friston, K. (1998). The disconnection hypothesis. Schizophrenia Research, 30 (2), 115–125. Friston, K. (200 5). A theory of cortical responses. Philosophical transactions of the Royal Society B: Biological Sciences , 360 (1456), 815–836.
276
References
Friston, K. (2010). The free-energy principle: A uni fied brain theory? Nature Reviews Neuroscience , 11, 127–138. Frith, C. D., & Frith, U. (2012). Mechanisms of social cognition, Annual Review of Psychology , 63, 287–313. Fukushima, Y., Tsukada, M., Tsuda, I., Yamaguti, Y., & Kuroda, S. (2007). Spatial clustering property and its self- similarity in membrane potentials of hippocampal CA1 pyramidal neurons for a spatio- temporal input sequence. Neurodynamics , 1, 305– 316. Gallagher, S. Cognitive (2000). Philosophical conceptions of the self: Implications for cognitive science. Trends in Cognitive Sciences, 4(1), 14–21. Gallese, V. & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2, 493– 501. Gaussier, P., Moga, S., Quoy, M., & Banquet, J. P. (1998). From perceptionaction loops to imitation processes: A bottom- up approach of learning by imitation. Applied Artificial Intelligence, 12(7-8), 701–727. Georgopoulos, A. P., Kalaska, J. F., Caminiti, R., & Massey, J. T. (1982). On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex. The Journal of Neuroscience, 2, 1527–1537. Gershkoff-Stowe, L., & Thelen, E. (2004). U-shaped changes in behavior: A dynamic systems perspective. Journal of Cognition and Development, 5, 11–36. Gibson, E.J., & Pick, A.D. (200 0). An ecological approach to perceptual learning and development . New York: Oxford University Press. Gibson, J. J. (1986). The ecological approach to visual perception. Boston: Houghton Miffl in. Goodale, M. A., Milner, A. D., Jakobson, L. S., & Carey, D. P. (1991). A neurological dissociation between perceiving objects and grasping them. Nature, 349 (6305), 154–1546. Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines. arXiv.org preprint arXiv:1410.5401. Graziano, M., Taylor, C., & Moore, T. (2002). Complex movements evoked by microstimulation of precentral cortex. Neuron , 34, 841–851. Gunji, Y. & Konno, N. (1991). Artificial life with autonomously emerging boundaries. Applied Mathematics and Computation , 43, 271–298. Haggard, P. (2008). Human volition: towards a neuroscience of will. Nature Reviews Neuroscience , 9 (12), 934– 946. Haken, H. (1983). Advanced synergetics. Berlin: Springer. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A. & Ng, A. Y. (2014). DeepSpeech: Scaling up end- to-end speech recognition. arXiv.org preprint arXiv:1412.5567. Harnad, S. (1990). The symbol grounding problem. Physica D, 42, 335– 346.
References 277 Harnad, S. (1992). Connecting object to symbol in modeling cognition. In A. Clarke, & R. Lutz (Eds.), Connectionism in context . Berlin: Springer Verlag. Haruno, M., Wolpert, D. M., & Kawato, M. (2003). Hierarchical MOSAIC for movement generation. In International congress series (Vol. 1250, pp. 575–590). Amsterdam: E lsevier. Harris K. (2008). Stability of the fittest: Organizing learning through retroaxonal signals. Trends in Neurosciences, 31(3), 130–136. Hasson, E., Vallines, Heeger, D. & Rubin, N. (2008). A hierarchyU., of Yang, temporal receptiveI.,windows in J., human cortex. The Journal of Neuroscience , 28(10), 2539–2550. Hauk, O., Johnsrude, I., & Pulvermuller, F. (2004). Somatotopic representation of action words in human motor and premotor cortex. Neuron, 41(2), 301–307. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science , 298 (5598), 1569–1579. Heidegger, M. (1962). Being and time (J. Macquarrie, & E. Robinson, Trans.). London: SCM Pres s. Molesworth, W. (1841).The English works of Thomas Hobbes ( Vol. 5). J. Bohn, 1841. Hinton, G., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation , 18(7), 1527–1554. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation , 9 (8), 1735–1780. Husserl, E. (1964). The phenomenology of internal time consciousness (J. S. Churchill, Trans.). Bloomington, IN: Indiana University Press. Husserl, E. (1970). Logical investigations ( Vol. 1). London: Routledge & Kegan Paul Ltd. Husserl, E. (2002). Studien zur arithmetik und geometrie . New York: Springer-Verlag. Hyvarinen, J., & Poranen, A. (1974). Function of the parietal associative area 7 as revealed from cellular discharges in alert monkeys. Brain, 97, 673– 692. Hwang, J., Jung, M., Madapana, N., Kim, J., Choi, M., & Tani, J. (2015). Achieving “synergy” in cognitive behavior of humanoids via deep learning of dynamic visuo- motor-attentional coordination. In Proceeding of 2015 IEEERAS 15th International Conference on Humanoid Robots (pp. 817–824). Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C. & Rizzolatti, G. (1999). Cortical mechanisms of imitation. Science , 286, 2526–2528. Ijspeert, A. J. (2001). A connectionist central pattern generator for the aquatic and terrestrial gaits of a simulated salamander. Biological Cybernetics , 84, 331–348.
278
References
Ikeda, K., Otsuka, K. & Matsumoto, K. (1989). Maxwell- Bloch turbulence. Progress of Theoretical Physics, 99, 295– 324. Ikegami, T. & Iizuka, H. (2007). Turn-taki ng interaction as a cooperative and co-creative process. Infant Behavior and Development, 30 (2), 278–288. Ikegami, T. (2013). A design for living technology: Experiments with the mind time machi ne. Artificial Life, 19(3– 4), 387–400. Ikegaya, Y., Aaron, G., Cossart, R., Aronov, D., Lampl, I., et al. (2004). Synfire and564. cortical songs: Temporal modules of cortical activity. Science , chains 304, 559– Iriki, A., Tanaka, M., & Iwamura, Y. (1996). Coding of modified body schema during tool use by macaque postcentral neurones. Neuroreport, 7(14), 2325–2330. Ito, M. (1970). Neurophysiological basis of the cerebellar motor control system. International Journal of Neurology , 7, 162–176. Ito, M. (2005). Bases and implications of learning in the cerebellum—adaptive control and internal model mechanism. Progress in Brain Research, 148, 95–109. Ito, M., & Tani, J. (2004). On-line imitative interaction with a humanoid robot using a dynamic neural network model of a mirror s ystem. Adaptive Behavior, 12(2), 93–115. Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless telecommunication. Science , 308, 78– 80. Jaeger, H., Lukoševič ius, M., Popovici, D., & Siewert, U. (2007). Optimization and applications of echo state networks with leaky- integrator neurons. Neural Networks, 20 (3), 335– 352. James, W. (1884). The dilemma of determinism . Unitarian Review ( Vol. XXI I, p.193). Reprinted (1956) in The will to believe ( p.145). Mineola, NY: Dover Publications, p.145. James, W. (1892). The stream of consciousness . World: Cleveland, OH. James, W. (1918). The principles of psychology (Vol 1). New York, NY: Henry Holt. Jeannerod, M. (1994). The representing brain: Neural correlates of motor intention and imagery. Behavioral and Brain Sciences , 17, 187–202. Johnson-Pynn, J., Fragaszy , D. M., Hirsh, E. M., Brakke, K. E ., & Greenfield, P. M. (1999). Strategies used to combine seriated cups by chimpanzees (Pan troglodytes), bonobos (Pan paniscus), and capuchins (Cebus apella). Journal of Comparative Psychology, 113(2), 137–148. Jordan, M. I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedimgs of Eighth Annual Conference of Cognitive Science Society (pp. 531–546). Hil lsdale, NJ: Erlbaum.
References 279 Jung, M., Hwang, J., & Tani, J. (2015). Self-organization of spatio-temporal hierarchy via learning of dynamic visual image patterns on action sequences. PLoS One, 10(7), e0131214. Kaneko, K. (1990). Clustering, coding, switching, hierarchical ordering and control in a network of chaotic elements. Physica D, 41, 137–72. Karmiloff- Smith, A. (1992). Beyond modularity: A developmental perspective on cognitive science. Cambridge, MA: MI T Press. Kawato, M. (1990). Computational schemes and neuralInnetwork models for formation and control of multijoint arm trajectory. T. Miller, R. S. Sutton, & P. J. Werbos (Eds.), Neural networks for control (pp. 197–228). Cambridge, MA: MI T Press. Kelso, S. (1995). Dynamic patter ns: The self-organization of brain and behavior. Cambridge, MA: MI T Press. Kiebel, S., Daunizeau, J ., & Friston, K. ( 2008). A hierarchy of time- scales and the brain. PLoS Computational Biology , 4, e1000209. Kimura, H., Akiyama, S., & Sakurama, K. (1999). Realization of dynamic walking and ru nning of the quadruped using neural oscil lator. Autonomous Robots, 7(3), 247–258. Kirkham, N., Slemmer, J., & Johnson, S. (2002). Visual statistical learning in infancy: Evidence for a domain general learning mechanism. Cognition , 83, B35– B42. Klahr, D., Chase, W. G., & Lovelace, E. A. (1983). Structure and process in alphabetic retrieval. Journal of Exper imental Psychology: Learning, Memory, and Cognition, 9 (3), 462. Kolen, J. F. (1994). Exploring computational complexity of recur rent neural networks. (PhD thesis, The Ohio State University). Kourtzi, Z., Tolias, A. S., Altmann, C. F., Augath, M., & Logothetis, N. K. (2003). Integration of local features into global shapes: monkey and human fMR I studies. Neuron, 37(2), 333– 346. Koza, J. R. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge, MA: MIT Press. Krichmar, J. L. & Edelman, G. M . (2002). Machine psycholo gy: autonomous behavior, perceptual categorization and conditioning in a brain-based device, Cerebral Cortex , 12, 818– 830. Kuniyoshi, Y., Inaba, M. and Inoue, H. (1994). Learning by watching: Extract ing reusable task knowledg e from visu al observation of human performance. IEEE Transactions on Robotics Automation , 10, 799– 822. Kuniyoshi, Y., Ohmura, Y., Terada, K., Nagakubo, A., Eitoku, S. I., & Yamamoto, T. (2004). Embodied basis of invariant features in execution and perception of whole- body dynamic actions— knacks and focuses of Roll-and-Ris e motion. Robotics and Autonomous Systems , 48 (4), 189– 201.
280
References
Kuniyoshi. Y., & Sangawa, S. (2006). Early motor development from partially ordered neural- body dynamics— experiments with a cortico- spinalmusculo- skeletal model. Biological Cybernetics , 95, 589– 605. Laird, J. E., Newell, A., & Rosenbloom, P. S. (1987). Soar: An architecture for general intelligence. Artificial Intelligence, 33, 1– 64. Laird, J. E. (2008). Extending the Soar cognitive architecture. Frontiers in Artificial Intelligence and Applications , 171, 224. LeCun, Y., applied Bottou,toL., Bengio, recognition. Y., & Haffner, P. (1998). Gradient-based learning document Proceedings of the IEEE , 86 (11), 2278–2324. Li, W., Piëch, V., & Gilbert, C . D. (2006 ). Contour sal iency in primary visual cortex. Neuron , 50 (6), 951–962. Libet, B. (1985). Unconscious cerebral initiative and the role of conscious will in voluntary act ion. Behavioral and Brain Sciences , 8, 529– 539. Lu, X., & Ashe, J. (2005). Anticipatory activity in primary motor cortex codes memorized movement sequences. Neuron , 45, 967–973. Luria, A. (1973). The working brain. London: Penguin Books Ltd. McCarthy, J. (1963). Situations, actions and causal laws. Stanford Artificial Intelligence Project, Memo 2. Stanford University. Markov, A. (1971). Extension of the limit theorems of probability theory to a sum of variables connected in a chain. Dynamic Probabilistic Systems , 1, 552–577. Markram, H., Muller, E., Ramaswamy, S., Reimann, M. W., Abdellah, M., Sanchez, C. A., … & Kahou, G. A. A. (2015). Reconstruction and simulation of neocortical microcircu itry. Cell , 163(2), 456– 492. Matari ć, M. (1992). Integration of representation into goal- driven behaviorbased robots. IEEE Transactions on Robotics and Automation , 8 (3), 304– 312. Matsuno, K. (1989). Physical Basis of Biology . Boca Raton, FL: CRC Press. Maturana, H. R., & Varela, F. J. (1980). Autopoiesis and Cognition. Netherlands: Springer. May, R. M. (1976). Simple mathematical models with very complicated dynamics. Nature, 261(5560), 459– 467. Meeden L. (1996). An incremental approach to developing intelligent neural network controllers for robots. IEEE Transactions on Systems, Man, and Cybernetics, Part B , 26 (3), 474–485. Merleau-Ponty, M. (1962). Phenomenology of perception (C. Smith, Trans.), London: Routledge & Kegan Paul Ltd. Merleau-Ponty, M. (1968). The Visible and the invisible: Followed by working notes (Studies in phenomenology and existential philosophy). Evanston, IL: Northwestern University Press. Meltzoff, A. N., & Moore, M. K. (1977). Imitation of facial and manual gestures by human neonates. Science , 198(4312), 75–78.
References 281 Meltzoff, A.N. (2005). “Imitation and other minds: The ‘like me’ hypothesis.” In S. Hurley and N. Chater (Eds.), Perspectives on imitation: From cognitive neuroscience to social science (pp. 55– 77). Cambridge, MA: MI T Press. Metta, G., Natale, L., Nori, F., Sandini, G., Vernon, D., Fadiga, L., et al. (2010). The iCub humanoid robot: An open-systems platform for research in cognitive development. Neural Networks , 23(8– 9), 1125–1134. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits 63(2), on 81.our capacity for processing information. Psychological Review , Morimoto, J., & Doya, K. (2001). Acquisition of stand-up behavior by a real robot using hierarch ical reinforcement learning. Robotics and Autonomous Systems , 36 (1), 37–51. Mormann, F., Kornblith, S., Quiroga, R. Q., Kraskov, A., Cerf, M., Fried, I., & Koch, C. (2008). Latency and selectivity of single neurons indicate hierarchical processing in the human medial temporal lobe. Journal of Neuroscience , 28, 8865– 8872. Mulliken, G. H., Musallam, S., & Andersen, R. A., (2008). Forward estimation of movement state in posterior parietal cortex. Proceedings of the National Academy of Sciences of the USA , 105(24), 8170– 8177. Murata, S., Yamashita, Y., Arie, H., Ogata, T., Sugano, S., & Tani, J. (2015). Learning to perceive the world as probabilistic or determini stic via i nteraction with others: A neuro- robotics experi ment. IEEE Transactions on neural Networks and Learning Systems , [2015 Nov 18; epub ahead of print], DOI: 10.1109/TNNLS.2015.2492140 Mushiake, H., Inase, M., & Tanji, J. (1991). Neuronal activity in the primate premotor, supplementary, and precentral motor cortex during visually guided and internally determined sequential movements. Journal of Neurophysiology , 66 (3), 705–718. Nadel, J. (2002). Imitation and imitation recognitio n: Functional use in preverbal infants and nonverbal children with autism. In A. N. Meltzoff, & W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases (pp. 42–62). Cambridge University Press. Nagai, Y., & Asada, M. (2015). Predictive learning of sensorimotor information as a key for cognitive development. In Proceedings of the IROS 2015 Workshop on Sensor imotor Contingencies for Robotics . Osaka, Japan. Namikawa, J., Nishimoto, R., & Tani, J. (2011). A neurodynamic account of spontaneous behavior. PLoS Computational Biology , 7(10), e1002221. Newell, A., & Simon, H. (1972). Human problem solving . Englewood Cliffs, NJ: Prentice-Hall. Newell, A., & Simon, H. A. (1975). Computer science as empirical inquiry: Symbols and search. Communications of the ACM, 19(3), 113–126. Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press.
282
References
Nicolis, G., & Prigogine, I. (1977). Self-organization in nonequilibrium systems . New York: Wiley. Nishida, K. (1990). An inquiry into the good (M. Abe & C. Ives, Trans.). New Haven: Yale University Press. Nishimoto, R., & Tani, J. (2009). Development of hierarchical structures for actions and motor imagery: A constr uctivist view from sy nthetic neurorobotics study. Psychological Research , 73, 545– 558. Nolfi, & Floreano, D. organizing (2000). Evolutionary robotics: TheMA: biolog andS.technology of selfmachines. Cambridge, My, ITintelligence, Press. Nolfi, S., & Floreano, D. (2002). Synthesis of autonomous robots through arti ficial evolution. Trends in Cognitive Sciences, 6 (1), 31–37. Ogai, Y., & Ikegami, T. (2008). Microslip as a simulated artificial mind. Adaptive Behavior, 16(2–3), 129–147. Ogata, T., Hattori, Y., Kozima, H., Komatani, K., & Okuno, H. G. (2006). Generation of robot motions from environmental sounds using intermodality mapping by RNNPB. In Sixth International Workshop on Epigenetic Robotics , Paris, France. Ogata, T., Yokoya, R., Tani, J., Komatani, K., & Okuno, H. G. (2009). Prediction and imitation of other’s motions by reusing own forward-inverse model in robots. In Proceedings of the 200 9 IEEE Inter national Conference on Robotics and Automation (pp. 4144– 4149). Kobe, Japan. O’Regan, J. K., & Noe, A. (2001). A sensorimotor account of vision and visual consciousness. Behavioral & Brain Sciences , 24, 939–1031. Oudeyer, P. Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic motivation systems for autonomous mental development. Evolutionary Computation, IEEE Transactions on , 11(2), 265–286. Oztop, E., Kawato, M., & Arbib, M. (2006). Mirror neurons and imitation: A computationally guided review. Neural Networks , 19(3), 254–271. Paine, R. W., & Tani, J. (2005). How hierarchical control self-organizes in arti ficial adaptive systems. Adaptive Behavior, 13(3), 211–225. Park, G., & Tani, J. (2015). Development of compositional and contextual communicable congruence in robots by using dynamic neural network models. Neural Networks , 72, 109–122. Pepperberg, I. M., & Shive, H. R. (2001). Simultaneous development of vocal and physical object com binations by a Grey par rot (Psittacus erithacus): Bottle caps, lids, and labels. Journal of Comparative Psychology , 115(4), 376–384. Perry, W., & Braff, D. L. (1994). Information-processing deficits a nd thought disorder. American Journal of Psychiatr y, 15(1), 363– 367. Pfeifer, R & Bongard, J. (2006). How the body shapes the way we think—A new view of intelligence . Cambridge, MA: MI T Press. Piaget, J. (1951). The child's conception of the world. Rowman & Littlefield.
References 283 Piaget, J. (1962). Play, dreams, and imitation in childhood (G. Gattegno, & F. M. Hodgson, Trans.). New York: Norton. Pollack, J. B. (1991). The induction of dynamical recognizers. Machine Learning , 7, 227–252. Pulvermuller, F. (2005). Brain mechanisms linking language and action. Nature Neuroscience , 6 (5), 76–82. Ramachandran, V. S., & Blakeslee, S. (1998). Phantoms in the brain: Probing
theR., mysteries of the . New York: William Morrow. Rao, & Ballard, D.human (1999).mind Predictive coding in the visual cortex: A functional interpretation of some extra- classical receptive- field effects. Nature Neuroscience , 2, 79– 87. Ritter, F. E., Baxter, G. D., Jones, G., & Young, R. M. (200 0). Supporting cognitive models as u sers. ACM Transactions on Computer-Human Interaction , 7(2), 141–173. Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research , 3, 131–141. Rizzolatti, G., Fogassi, L., & Gallese, V. (2001). Neurophysiological mechanisms u nderlying the underst anding and imitation of action. Nature Review Neuroscience , 2, 661–670. Rizzolatti, G., & Craighero, L. (2004). The mirror- neuron system. Annual Review of Neuroscience , 27, 169–192. Rosander, R., & von Hofsten, C. (2004). Infants’ emerging ability to represent object motion. Cognition, 91, 1–22. Rössler, O. E. (1976). An equation for continuous chaos. Physics Letters , 57A(5), 397–398. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagati on. In D. E. Rumelhart, & J. L. Mclelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition . Cambridge, MA: MI T Press. Rumelhart, D. E., McClelland, J. L., & the PDP Research Group. (1986). Parallel distr ibuted processing: Explorations in the microstruct ure of cognition , Cambridge, MA: MI T Press. Saffran, J., Aslin, R., & Newport, E . (1996). Statistical learning by 8- monthold infants. Science , 274, 1926–1928. Sakata, H., Taira, M., Murata, A., & Mine, S. (1995). Neural mechanisms of visual guidance of hand action in the parietal cortex of the monkey. Cerebral Cortex , 5(5), 429–438. Schaal, S. (1999). Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences , 3, 233–242. Scheier, C., Pfeifer, R., Kuniyoshi, Y. (1998). Embedded neural networks: Exploiting constraints. Neural Networks, 11, 1551–1596.
284
References
Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation , 4 (2), 234– 242. Schöner, G. & Kelso, J. A. S. (1988). Dynamic pattern generation in behavioral and neural systems. Science , 239, 1513–1539. Schöner, G., & Thelen, E. (2006). Using dynamic field theory to rethink infant habituation. Psychological Review , 113(2), 273–299. Shanahan, M. (200 6). A cognitive architecture that combin es internal simulation with workspace. Consciousness and cognition 15(2),visual 433– 449. Shibata, K., a&global Okabe, Y. (1997). Reinforcement learning ,when sensory signals are d irectly given as inputs. In Proceedings of IEE E International Conference on Neural Networks (Vol. 3, pp. 1716–1720). Shima, K., & Tanji, J. (1998). Both supplementary and presupplementary motor areas are crucial for the temporal organization of multiple movements. Journal of Neurophysiology, 80, 3247–3260. Shima, K., & Tanji, J. (2000). Neuronal activity in the supplementary and presupplementary motor areas for temporal organization of multiple movements. Journal of Neurophysiology , 84, 2148–2160. Shimojo, S. (2014). Postdiction: Its implications on visual awareness, hindsight, and sense of agency. Frontiers in Psychology , 5, 196. Siegelmann, H. T. (1995). Computation beyond the Turing limit. Science , 268 (5210), 545– 548. Simon, H. A. (1981). The sciences of the artificial (2nd ed.). Cambridge, MA: MIT Press. Sirigu, A., Daprati, E., Ciancia, S., Giraux, P., Nighoghossian, N., Posada, A., & Haggard, P. (2003). Altered awareness of voluntary action after damage to the parietal cortex. Nature Neuroscience , 7, 80– 84. Sirigu, A., Duhamel, J. R., Cohen, L., Pillon, B., Dubois, B. & Agid, Y. (1996). The mental representation of hand mo vements af ter parietal cortex damage. Science , 273(5281), 1564–1568. Smith, L. & T helen, E. (2003). Devel opment as a dynamic s ystem. Trends in Cognitive Scie nces, 7(8), 343– 348. Soon, C., Brass, M., Heinze, H., & Haynes, J. (2008). Unconscious determinants of free decisions in the human brain. Nature Neuroscience , 11, 543–545. Spencer-Brown, G. (1969). Laws of form . Wales, UK: George Allen and Unwin Ltd. Spivey, M. (2007). The continuity of mind. New York: Oxford University Press. Sporns, O. (2010). Networks of the brain . Cambridge, MA: MI T Press. Squire, L. R., & Alvarez, P. (1995). Retrograde amnesia and memory consolidation: A neurobiological perspective. Current Opinion in Neurobiology, 5, 169–177.
References 285 Steil, J. J., Röthling, F., Haschke, R., & Ritter, H. (2004). Situated robot learning for multi- modal instruction and imitation of grasping. Robotics and Autonomous Systems , 47(2), 129–141. Sugita, Y., & Tani, J. (2005). Learning semantic combinatoriality from the interaction between linguistic and behavioral processes. Adaptive Behavior, 13(1), 33– 52. Sun, R. (2016). Anatomy of mind: Exploring psychological mechanisms and
processes the CLA RION cognitive architecture. New York: Oxford Universitywith Press. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D.,… & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–9). Taga, G., Yamaguchi, Y. & Shimizu, H. (1991). Self-organized control of bipedal locomotion by neural oscillators in unpredictable environments. Biological Cybernetics , 65, 147–159. Tanaka, K. (1993). Neuronal mechanisms of object recognition. Science , 262, 685– 688. Tani, J. (1996). Model-based learning for mobile robot navigation from the dynamical systems perspective. IEEE Transactions on Systems, Man, and Cybernetics, Part B , 26 (3), 421–436. Tani, J. (1998). An interpretation of the “self” from the dynamical systems perspective: A constructivist approach. Journal of Consciousness Studies , 5(5-6 ), 516–542. Tani, J. (2003). Learning to generate articulated behavior through the bottomup and the top- down interaction process. Neural Networks, 16, 11–23. Tani, J. (2004). The dynamical systems accounts for phenomenology of immanent time: An interpretation by revisiting a robotics synthetic study. Journal of Consciousness Studies , 11(9), 5–24. Tani, J. (2009). Autonomy of “self” at criticality: The perspective from synthetic neuro- robotics. Adaptive Behavior, 17(5 ), 421–443. Tani, J. (2014). Self-Organization and compositionality in cognitive brains: A neurorobotics study. Proceedings of the IEEE, 102(4), 586–605. Tani, J., Friston, K., & Haykin, S. (2014). Self-organization and compositionality i n cognitive brains [Further t houghts]. Proceedings of the IEEE, 4 (102), 606– 607. Tani, J., & Fukumura N. (1997). Self-organizing internal representation in learning of navigation: a physical experiment by the mobile robot YAMABICO. Neural Networks, 10(1), 153–159. Tani, J., & Fukumura, N. (1993). Learning goal-directed navigation as attractor dynamics for a sensory motor system. (An experiment by the mobile robot YAMABICO), Proceedings of the 1993 International Joint Conference on Neural Networks (pp. 1747–1752).
286
References
Tani, J., & Fukumura, N. (1995). Embedding a grammatical description in determinist ic chaos: an experiment in recurrent neural learning. Biological Cybernetics , 72(4), 365– 370. Tani, J. and Nolfi, S. (1997). Self-organization of modules and their hierarchy in robot learning prblems: A dynamical sys tems approach. System Analysis for Higher Brain Function Research Project News Letter, 2(4), 1–11. Tani, J., & Nolfi, S. (1999). Learning to perceive the world as articulated: An approach for(7), hierarchical Networks , 12 1131–1141.learning in sensory- motor systems. Neural Tani, J., Ito, M., Sugita, Y. (2004). Self-organization of distributedly represented multiple behavior schemata in a mirror system: Reviews of robot experiments using RNNPB. Neural Networks , 17, 1273–1289. Tani, T. (1998). The physics of consciousness . Tokyo: Keiso-shobo. Taniguchi, T., Nagai, T., Nakamura, T., Iwahashi, N., Ogata, T., & Asoh, H. (2016). Symbol emergence in robotics: A survey. Advanced Robotics, DOI: 10.1080/01691864.2016.1164622. Tanji, J., & Shima, K. (1994). Role for supplementary motor area cells in planning several movements ahead. Nature, 371, 413–416. Tettamanti, M., Buccino, G., Saccuman, M. C., Gallese, V., Danna, M., Scifo, P. F., Fazio, Rizzolatti, G. S., Cappa, F., & Perani, D. (2005). Listening to action-related sentences act ivates fronto- parietal motor circuits. Journal of Cognitive Neuroscience , 17(2), 273–281. Thelen, E. & Smith, L. (1994). A dynamic systems approach to the development of cognition and action, Cambridge, MA: MI T Press. Tokimoto, N., & Okanoya, K. (2004). Spontaneous construction of “Chinese boxes” by Degus (Octodon degu): A rudiment of recursive intelligence? Japanese Psychological Research, 46 , 255–261. Tomasello, M. (2009). Constructing a language: A usage- based theory of language acquisition . Cambridge, MA: Harvard University Press. Tononi, G. (2008). Consciousness as integrated information: A provisional manifesto. The Biological Bulletin , 215(3), 216–242. Trevena, J. A., & Miller, J. (2002). Cortical movement preparation before and after a conscious decision to move. Consciousness and Cognition, 11(2), 162–190. Tsuda, I., Körner, E. & Shimizu, H. (1987). Memory dynamics in asynchronous neural networks. Progress of Theoretical Physics, 78, 51–71. Tsuda, I. (2001). Toward an interpretation of dynamic neural activity in terms of chaotic dynamical systems. Behavioral and Brain Sciences, 24(5), 793–810. Uddén, J., & Bahlmann, J. (2012). A rostro-caudal gradient of structured sequence processing in the left inferior frontal gyrus. Philosophical. Transactions of the Royal Society of London Series B-Biological Sciences, 367, 2023–2032.
References 287 Ueda, S. (1994). Experience and Awareness: Exploring Nishida Philosophy (English Translation from Japanese). Tokyo, Japan: Iwanami Shoten. Keiken to j ikaku– Nishida tetsugak u no basho wo motom ete, Iwanami- shoten. Ugur, E., Nagai, Y., Sahin, E., & Oztop, E. (2015). Staged development of robot skills: Behavior formation, affordance learning and imitation with motionese. IEEE Transactions on Autonomous Mental Development, 7(2), 119–139. VanWit, de Cruys, S., Evers, K., Van der Hallen, R., Van Eylen, L.,worlds: Boets, predicB., deL., & Wagemans, J. (2014). Precise minds in uncertain tive coding in autism. Psychological Review , 121(4), 649. Varela, F. J., Thompson, E. T., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience . Cambridge, MA: MI T Press. Varela, F. J. (1996). Neurophenomenology: A methodological remedy to the hard problem. Journal of Consciousness Studies , 3, 330– 350. Varela, F. J. (1999). Present-time consciousness. Journal of Consciousness Studies , 6 (2-3), 111–140. von Hofsten, C., & Rönnqvist, L. (1988). Preparation for grasping an object: A developmental study. Journal of Exper imental Psychology: Human Perception and Performance , 14, 610– 621. Werbos, P. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences . (PhD thesis, Harvard University). Werbos, P. J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4), 339–356. White, J. (2016). Simulation, self-extinction, and philosophy in the ser vice of human civili zation. AI & Society, 31(2), 171–190. Wilson, M. A. & McNaughton, B. L. (1994). Reactivation of hippocampal ensemble memories during sleep. Science , 265, 676– 679. Williams, B. (2014). Descartes: The project of pure enquiry . London and New York: Routledge. Williams, R. J. & Zipser, D. (1989). A Learning algorithm for continually running fully recurrent neural networks. Neural Computation , 1, 270–280. Wolpert, D., & Kawato, M. (1998). Multiple paired forward and inverse models for motor control. Neural Networks , 11, 1317–1329. Yamashita, Y., & Tani, J. (2008). Emergence of functional hierarchy in a multiple timescale neural network model: A humanoid robot experiment. PLoS Computational Biology , 4(11), e1000220. Yamashita, Y., & Tani, J. (2012). Spontaneous prediction error generation in schizophrenia. PloS One , 7(5), e37843. Yen, S. C., Baker, J., & Gray, C. M. (2007). Heterogeneity in the responses of adjacent neurons to natural stimuli in cat striate cortex. Journal of Neurophysiology , 97, 1326–1341.
288
References
Ziemke, T. & Thieme, M. (2002). Neuromodulation of reactive sensorimotor mappings as a short-term memory mechanism in delayed response tasks. Adaptive Behavior, 10(3/4), 185–199. Zhong, J., Cangelosi, A., & Wermter, S. (2014). Toward a self-organizing presymbolic neural model representing sensorimotor primitives. Frontiers in Behavioral Neuroscience , 7, 22.
Index
Note: Page numbers followed by “ f ” and “t ” denote figures and tables, respectively. absolute flow level, 28, 39 abstract sequences, SMA encoding, 52–53 accidental generations with spontaneous variation, 225– 26 actional consequences, predictive learning from, 151–73 action generation, 197–98 brain, 44– 68 hierarchical mechanism s for, 50 f, 52f through hierarchy , 49– 50, 50 f perception's role in, 55 sensory- motor flow mirroring,
free will for, 219– 41, 221f, 223 f, 225f, 228f, 229 f, 233f, 234 f, 236f, 238f functional hierarchy developed for, 199–218, 200f, 201f, 203f, 265 intransitive, 65 language bound to, 190– 96, 192f neurodynamics generating tasks of, 213–15, 214f parietal cortex meeting of, 56– 61, 59f perceptual reality cha nged by, 60 –61
175–98, 178f action-related words, 67 actions. See also complex actions; goal-directed act ions; intentions categories, 196 conscious decision preceded by, 237 external inputs pertu rbing, 215
primitives, 145– 48, 147f recognition's circular causality with, 149 subjective mind influenced by, 49 training influencing, 215 transitive, 65 unconscious generation of, 215 as voluntary, 39
289
290
Index
action sequences, 15, 219–20. See also chunking as compositional, 229– 30, 252 MTR NNs generating, 227–30, 228f, 229f as novel, 227–30, 229f active learner, 143
authentic being, 31–32, 143, 148, 171–72 authenticity, 31–32, 237–39, 238f, 267 autism, 256, 257–58 autorecovery, 155, 157, 160
active n, 130– 31 Act-R,perceptio 14 affective developmental robotics, 261 affordance, 93– 94. See also Gibsonian approach agent, 31–32 alien hand syndrome , 50– 51 alternative images, 71 alternative thoughts, 71 Amari, Shun- ichi, 113. See also error back-propagation scheme ambiguity, 63– 64 animals, recursion- like behaviors exhibited by, 10–11 A-not-B task, 98– 100, 99 f appearance, 24– 25, 24f Arbib, Michael, 9– 10, 58, 66, 175, 191 Arie, H., 228 f Aristotle, 4, 261 arm robot, 131–32, 131f arti ficial evolution, 126–28 Asada, Mi noru, 261 Ashe, J., 52–53
back-propagation (BPTT ), 116fthrough time Badre, D., 206 –7, 207f Bahlmann, J., 206– 7, 207f Bak, P., 171 Ballard, Dona, 48, 60 Beer, Randall, 121, 126f, 127–28, 128f behavior att ractor, 130– 31 behavior-based approach, 37 behavior-based robotics, 82, 1 03– 9, 104f, 105f, 107f behavior primitives, 10, 13. See also chunks compositions, 200– 202, 200 f, 201f functional hierarchy development, 203f, 205– 6 localist scheme, 200– 201, 200 f MTRNN, 2 04– 6 PB vector value assigned to, 201–2, 201f behaviors. See also ski lled behaviors distr ibuted representation embedding of, 180– 82, 181f
attractive object, 98– 100, 99 f attractors, 91– 97. See also behavior attractor; limit cycle attractors as global, 158– 59, 158f invariant set, 84, 1 58– 59, 158f Rössler, 90, 158 types of, 84– 85, 85f, 91f attunement, 39 authentic agent, 31–32
as imitative, 66, 1 00 –102, 102f model, 191–96 as reactive, 141–42 as spontaneous, 219– 30 Being and Time (Heidegger), 30 being-in-the-world, 29– 32, 34 beings, 22, 24– 25 as authentic, 31–32, 143, 148, 171–72
Index of equipment, 30– 31 as inauthentic, 142–43 man reflecting on, 31 meaning of, 30 bimodal neurons, 5 3– 54, 56– 57, 208 Blakemore, S.-J., 58 Blakeslee, 6334, 61 blind man,S., 33– Blue Brain project, 255 bodies, 33– 34, 59– 60, 59f. See also Cartesian dualism bonobos, 11 bottom-up error regres sion, MTRN N, 207–8, 207f, 215 bottom-up pathway, 63–64, 164–65, 205, 263 bottom-up recognition, 60– 61, 197–98, 266 bound learning, 191–96, 192f, 193f, 195f boys, 119, 119f, 120 BPTT. See back-propagation through time brains. See also neural network models; specific brain structures action generation, 44– 68, 65 f brain science and, 43–79, 65f, 70f, 73f chemical plant as, 4– 5, 6 cognitive competencies hosted by, 10 dynamics in, 40– 41 FLN component in, 11 hierarchical mechanisms, 44– 54, 47f, 50 f, 52f human language- ready, 66– 67, 191 intention adjustment mechanisms employed by, 60 – 61 mind srcinating in, 4– 6
291
models, 81–83, 109–12 outcomes monitored by, 60 – 61 overview, 43–79 recognition in, 55– 68, 59f, 62f, 65f spatio-temporal hierarchy of, 217 symbols in, 245– 47 two-stage model mechanized by, 40 –41 visual recognition, 44– 54, 45f, 46f, 47f, 50f, 52f brain science, 262 – 67 brain and, 43– 79 future directions of, 255– 61 on linguist ic competency, 190– 91 MTR NN correspondence , 206– 8, 207f Braitenberg, Valentino, 103–6, 107, 108 –9 branching overview, 132–34, 133f Yamabico, 152–60, 153f Brecker, Michael, 252–53 Broca's area, 191, 196 Brooks, Rodney, 103, 106, 107–8, 107f, 125, 145 Buddhist meditation, 254– 55 button press trial, 69– 71, 70f calculus of i ndications, 18–19 Cantor set, 158– 60, 158f carpenter, 30– 31, 42, 248, 264 Cartesian dualism, 7, 16, 32–37, 36f, 149 cascaded recurrent neural network (cascaded RNN), 116f catastrophic forgetting, 165 cats, 49 cells, 49, 51–53, 52f, 57. See also neurons central pattern generators (CPGs), 126–28, 128f
292
Index
cerebellum, 58, 60 cerebral hemorrhage, 57 CFG. See context-free grammar Chalmers, David, 75, 172, 246 chaos, 87–90, 91f, 108, 225– 27 chaotic attractor, 84, 85f chaotic itinerancy, 168– 69, 169f
cognitive competencies, 10 cognitive fragmentation, 257 cognitive minds, 1 49– 50, 243–47, 253 cognitive models, 13–15, 14t cognitive processes, 7, 155, 266. See also embodied cognition;
chemical chiasm, 35plant, 4– 5, 6 chimps, 10– 11 Chomsky, Noam, 10–12, 12f, 190–91, 244, 260. See also faculty of language in broad sense; faculty of language in narrow sense chunking, 15, 262 chunks, 175–76, 197–98 junctions, 220, 225 –26 MTRNN, 2 04– 6, 222–23 QRIO, 186– 87 structures, 220, 225– 26 Churchland, M., 72, 175, 208, 226 circuit-level mechanisms, 76–77, 78–79 circular causality, 7, 149, 170–72, 179, 198, 240–41, 265–67 authenticity and, 237–39, 238f criticality and, 237– 39, 238f, 250– 51 CLARION. See Connectionist Learning w ith Adaptive Rule Induction On- line Clark, Andy, 94– 95, 246
embodiment cognitivism, 244– 45, 262 composition, 9–13, 12f context, 18–19 models, 13–15, 14t overview, 9–20 recursion, 9– 13, 12f symbol grounding problem, 15–18, 17f symbol systems, 9– 13, 12f coherence, 169–72 collective neurons, 63, 72–73, 73f collision-free maneuvering, 152 columnar organization, 45, 46, 46f, 49 comb, 50– 51 combinatory explosion problem, 161 complex actions developmental t raining of, 209 –15 experiments, 209– 15 QRIO, 209– 15, 209f, 212f, 214f complex object features, 46 – 47, 46f, 47f complex objects, 48 complex visual objects, 46 f
classical ar tificial intelligence (classical A I), 106 Cleeremans, A., 159 closed-loop mode, 178 CNN. See convolutional neural network codevelopment process, 213 cogito, 23, 25, 29–32, 107–9, 107f. See also being
compositional action sequences, 229– 30, 252 compositionality, 248– 49, 262, 266 in cognitive mind, 2 43– 47 development of, 152–61, 264 as fluid, 202, 216, 265 generalization and, 1 94– 96, 195f MTRNN, 217–18, 218f compositions, 145– 48, 147f
Index behavior primitives, 200– 202, 200 f, 201f cognitivism, 9– 13, 12f localist scheme, 200– 201, 200f in symbol systems, 9– 13, 12f concepts, 246 concrete movements, 33–34 Connectionist Learning with OnAdaptive Rule Induction line (CLA RION), 247 connection weight matrix, 121 conscious awareness, 248 free will for, 219– 41, 236f, 238 f intentions, 69 –75, 70f, 73f, 230– 39, 236f, 238 f conscious decision, action preceding, 237 conscious memory, 27–28 consciousness, 25, 187, 250. See also streams of consciousness absolute flow of, 28 cogito problem concerning, 29 –32 easy problem of, 75 free will and , 230– 39, 236f, 238f hard problem of, 75, 172, 249–50, 263, 267 postdiction and, 230– 39, 236f, 238 f questions about, 3– 4 structure of, 172 surpri se quantifying, 17 2n2 conscious states, 37– 39 consolidation, 164–69, 167f, 169f, 197, 225–26 context-free grammar (CFG), 11, 12f contexts, 18–19, 48, 157, 158–60, 158f continuity of minds, 100 continuous-time recurrent neural network (CTRNN), 120– 25,
293
122f, 123f, 127, 204–5. See also multiple-timescale recurrent neural network continuous-time systems, 90 contours, 47, 48 convolutional neural network (CNN), 256, 258 f corridor, 94, 95 cortical electrical st imulation study, 73–74 cortical song, 72 counting, 10– 12 CPGs. See central pattern generators creative images, 197 criticality, 237– 39, 238f, 250– 51 CTRNN. See continuous- time recurrent neural network cup nesting, 11 cursor, 57 Dale, R., 146 Dasein, 34 death, 31, 171–72 deep learning, 253– 54, 258–59 deep minds, 259 degu, 11 Demiris, Y., 200 –201, 200f Dennett, Daniel, 108 depression, 256 Descartes, René, 16, 29, 243. See also Cartesian dualism Desmurget, M., 73– 74, 75, 76, 237 D'Esposito, M., 206 –7, 207f deterministic chaos, 226– 27 deterministic dynamics, 226– 27 developmental psychology, 97–100, 99 f developmental tra ining, 209 –15, 209 f, 212f, 214f, 259– 61 Diamond, A., 211 difference equation, 83– 84
294
Index
dimension, 35– 36, 36f direct experiences, 22–23, 23f, 26–28, 106–8, 107f, 142–43 direct reference, 34 disconnectivity hypothesis, 257 discrete movements, 180, 180 f discrete time system, 85– 90, 86 f, f f 88 , 89 distr ibuted representation framework, 177, 180–82, 181f, 196– 97, 201–2 distu rbance of self, 256– 57 Doetsch, P., 258 domain specificity, 29 do-re-mi example, 26– 28 double intentionalities, 142 dreaming, 165 Dreyfus, H. L., 28– 29, 41, 145 dynamical structure, 86– 87, 132– 36, 133f, 135f, 245 dynamical systems. See also nonlinear dynamical systems continuous-time, 90 difference equation, 83– 84 discrete time, 85– 90, 86 f, 88f, 89f neurorobotics from perspective of, 125–36, 126f, 128f, 129f, 130f, 131f, 133f, 135f structural stability , 90 – 93, 91f, 92f dynamical systems approach, 79, 263
recurrent neural network with parametric biases A Dynamic Systems Approach to the Development of Cognition and Action (T helen and Smith), 97–98 dynamic systems theory, 83– 93, 85f, f f f f f 86 , 88 , 89 , 91 , 92
embodied cognition modeled by, 81–137, 126f self-organization applied by, 7 dynamic closure, 160, 160 f, 166– 68 dynamic conflict resolution, 215 dynamic learning, intermittency during, 166– 69, 167f, 169f dynamic neural network models, 137, 176–79, 178f. See also
Gibsonian approach and, 94– 95 prediction error generated by, 240 emergence through s ynthesis, 83 emergency shutdown, 4– 5 emotions, 261 end effectors, 67 end-to-end learning, 217 entrainment, 95– 96, 154– 56 epoché. See sus pension of disbelief
easy problem, 75 echo-state network, 124–25 Edelman, G., 255 edge of chaos, 89 electroencephalography (EEG), 61, 69–71, 70f electrophysiological experiments, 56– 57 “Elephants don't play chess” (Brooks), 106 Eliasmith, C ., 255 Elman, Jeffrey, 118–20, 119f Elman net, 118–20, 119f embodied cognitio n, 235– 36. See also dynamic neural network models definition of, 82 dynamical systems approach modeling, 81–137, 85f embodied mind, 32–37, 36f, 42, 254 embodiment, 78, 79, 107–9, 107f, 236– 37 dimension of, 35– 36
Index
295
error back- propagation scheme, 113–16, 113f, 123f, 257 CTRNN application of, 204– 5 perceptual sequences acquired by, 204– 5, 206 retrograde a xonal signaling mechanism
fixed point attractor, 84, 85, 85f, 94 FLB. See faculty of language in broad sense flesh, 34– 36 FLN. See faculty of language in narrow sense Floreano, D., 130–31
207–8233f, 234f, errorimplementing, regres sion, 231–39, 236f, 238 f, 257 Evans, Gareth, 9– 10 evolution, 126–32 experiences, 266. See also direct experiences; first- person experience; pure experience; subjective experiences continuous flow of, 26–28 perception depend ence of, 23– 42 of selfhood, 39 extended mind, 246 external inputs, 215 external observer, 252
flow,experiences, of subjective26–29 fluid compositionality, 202, 216, 265 fMRI. See fu nctional magnetic resonance imaging focus of expansion (FOE), 94, 95f forward model, 58, 152, 161 frame problem, 59, 161, 177–78 frame system, 29 Freeman, Walter, 55, 72–73, 225, 237 free will, 69– 75, 78, 218, 248–51, 261, 263, 265–67 for action, 219– 41, 236f, 238f for conscious awareness, 21 9– 41, 221f, 223f, 225f, 228f, 229 f, 233f, 234f, 236 f, 238 f consciousness and, 230– 39, 236f, 238 f consolidation, 225– 26 definition of, 39 experiments, 221–25, 221f, 223f, 225f intention correlates, 69 –75, 70f, 73f
facial imitation, 101 faculty of language in broad sense (FLB), 10, 12, 16 faculty of language in narrow sense (FLN), 10, 11–13, 12f, 16, 19 fallenness, 32 fast dynamics, 203 f, 204, 205, 206 at M1, 206–7, 207f QRIO, 210 feature representation, 49 feed-forward network model, 112–20, 113f, 116f, 119f, 129–30, 130f Feynman, Richard, 81, 103 fingers, 96– 97, 96f finite state machine (FSM), 17, 88– 89, 153, 160, 227 first-person experience, 106– 8, 107f
James considering, 225– 26 model for, 39– 41, 40f in MTRNN model, 220– 22, 221f overview, 39 postdiction and, 230– 39, 236f, 238 f stream of consciousness and, 37–41, 40f, 42 vehicle possessing, 108
296
Index
Fried, I., 74, 75–76 Friston, Karl, 172n2, 179, 250, 257 frontal cortex, 207 frontopolar par t of prefrontal cortex, 71–73, 73f FSM. See finite state machine Fukumura, Naohiro, 132
grammar, 11, 12–13, 12f grandmother cells, 2 45– 46 grasping neurons, 64– 66, 65 f Graves, A., 245 Graziano, M., 54 groundlessness, 240– 41, 251, 253, 264, 267–68
Fukushima, Y., 159– resonance 60 functional magnetic imaging (fMR I), 60– 61, 65, 66, 67, 70–71
Gunji, Y., 252
Gallagher, S., 170 Gallese, Vittorio, 67, 68 gated local network models, 200, 202 gated recurrent neural networks (RNNs), 200, 222 Gaussie, 131f, 182 generalization, 194– 96, 195f, 257–58 General Problem Solver (GPS), 13–15, 14t, 17, 156–57 genetic algorithm, 217 Georgopoulos, A., 49– 50, 54 Gershkoff- Stowe, L., 98, 99 Gestalt . See structuring processes of whole Gibson, Eleanor, 55, 58, 143 Gibson, J., 93– 95, 95f Gibsonian approach, 93– 95, 95f, 107, 263.See also Neo-
Haas, H., 125 hair, 50–51 hallucinations, 22 8–29 hammer, 30– 31, 42, 60, 170, 248, 264 hands, 34– 35, 63 metronome in synchrony with, 96– 97, 96f QRIO imitati ng, 187–90, 189f QRIO predicting, 183– 87, 184f, 185f handwriting recognition system, 258 hard problem, 75, 172, 249–50, 263, 267 harmonic oscillator, 92, 93 Harnad, Steven, 16. See also sy mbol grounding problem Harris, K., 124 Haruno, M., 200– 201, 200f Hauk, O., 66– 67 Hayes, G., 200 –201, 200f Hebbian learning, 190– 91 Heidegger, Martin, 21, 39, 41–42,
Gibsonian approaches global attractor, 15 8– 59, 158f goal-directed action plans, 13, 108, 156–57, 157f goal-directed actions, 65– 66, 210–15, 212f, 214f, 253 Goldman, Alvin, 67, 68 Goodale, Mel, 56 GPS. See General Problem Solver
60, 61, 142–43, 162, 170, 248, 251, 264, 267. See also being-in-the-world on future, 235 on past, 235 hermeneutics, 29– 31 Hierarchical Attentive Multiple Models for Execution and Recognition, 200– 201, 200f
Index
297
hierarchical mechanisms, 44– 54, 45f, 46f, 47f, 50f, 52f hierarchical mixture, 200– 201, 200 f hierarchical Modular Select ion and Identification for Control (MOSAIC), 2 00 –201, 200f f hierarchy, 49– 50, 50 262, 265165 hippocampus, 72–73,,73 f, 109, Hobbes, Thomas, 39, 237 Hochreiter, S., 216–17 holding neurons, 65 homeostasis principle, 182 Hopfield network, 164, 165 how pathway, 56, 63 Humanoid Brain project, 256 humanoid robots, 221–22, 221f, 227–30, 228f, 229f, 257. See also other robot; self robot humans cogito level had by, 107–9, 107f direct experiences for, 142–43 imitation for, 190 intransitive actions of, 65 language-ready brains, 66– 67, 191 linguist ic competency of, 66– 67, 190–91 mirror systems i n, 65, 66 parietal cortex da mage in, 57 presupplementary motor area, 75–76 Hume, David, 170
ideational apraxia, 57 ideomotor apraxia, 57 Ikegami, Takashi, 137 Ikegaya, Y., 72, 109 images, 71–72, 182, 197, 266. See also motor imagery; visual imagery imitation, 100 –102, 102f, 131–32, f 131 188 game, ,187–90, 189f, 251 for humans, 190 manipulation, 221–22, 221f by mental state reading, 182–90, 184f, 185f, 189f prediction error i nfluencing, 198 QRIO, 187–90, 189f imitative actions, statistical learning of, 221–22, 221f imitative behaviors, 66 imperative sentences, 191–96, 192f, 193f, 195f impression, 27 inauthentic agent, 31–32 inauthentic being, 142–43 incoherence, 169–72 index fingers, 96– 97, 96f indexing, 18– 19 infants developmental psychology, 97–100, 99 f imitation in, 100 –102, 102f intentionality possessed by, 211 object used by, 101–2, 102f,
Husserl, Edmund, 21, 22–23, 23f, 24–25, 24f, 41, 61, 106, 142–43, 145, 176, 186–87, 248– 49 on direct experiences, 26– 28 temporality notion of, 32 on time perception, 26– 29 Hwang, J., 259 hybrid system. See sy mbol grounding problem
143, 211 preverbal, 101–2, 102f inferior frontal cortex, 61 inferior parietal cortex, 183 inferior parietal lobe (IPL), 65– 66 inferior temporal area (TE) ( TEO), 46– 47, 46f, 47f inferotemporal cortex, 46– 47, 46f, 47f
298
Index
infinite regress ion, 19 information bottlenecks, 210 information hubs, 208, 210 information mismatch, 201, 202 information processing, 200– 201 initial states, 208– 15, 209f, 212f, 214f f of intention units, 220 –22, 221 setting, 218 inner prediction error, 257 instr umental activities, 102, 102 f “Intelligence without representation” (Brooks), 106 intentionalities, 28, 142, 161, 211, 225. See also subjectivity intentions, 56– 61, 59f, 78 conscious awaren ess, 69 –75, 70f, 73f, 230– 39, 236f, 238 f free will neural correlates, 69– 75, 70f, 73f initiation of, 71–75, 73f intention switched to from, 144 mirror neurons coding, 68 organization of, 74–75 parietal cortex as involved in, 76 from PFC, 144 prediction error generated by, 240 rising of, 69– 75, 70f, 73f spontaneous, 69– 75, 70f, 73f, 230– 40, 236 f, 238 f top-down subjective, 263 intention-to-perception mapping, 144 intention units, 204– 6, 218, 220–22, 221f interaction, 252–55 interactionism, problem of, 16, 243–47 intermediate dynamics, 203 f, 204, 205 MTRN N, 223f, 224
parietal cortex, 207– 8 QRIO, 210 VP trajectories, 213–15, 214f intermittency, dur ing dynamic learning, 166– 69, 167f, 169f intermittent chaos, 89, 168 intermittent transitions, 226 internal contextual dynamic struct ures, 132–36, 133f, 135f internal observer, 252 intransitive actions, 65 intraparietal sulcus, 62 invariant set, 84, 1 58– 59, 158f IPL. See inferior parietal lobe Iriki, Atsushi, 61– 62, 62f Ito, Masao, 58, 152 Jaeger, H., 125, 204 James, William, 21, 37–41, 40f, 42, 69, 71–72, 162, 170, 182, 250. See also streams of consciousness free will consideration of, 225– 26 momentary self spoken of by, 171 Jeannerod, M., 59 Johnson-Pynn, J., 11 Jordan-type recurrent neural network (Jordan-type RNN), 116f, 133f, 134 joystick task, 57 Karmiloff- Smith, A., 211 Kawato, Mitsuo, 58, 152 Kelso, Scott, 95– 97, 96f Khepera, 129– 30, 129f, 130f Kiebel, S., 206– 7, 207f kinetic melody, 202, 216 knowledge, 57, 266 Kohonen network, 164, 227–30, 228f, 229f Kourtzi, Z., 48
Index Kugler, 95– 96 Kuniyoshi, Y., 128–30, 129f, 130f, 175–76 Laird, John, 15, 246– 47 landmark-based navigation, mobile robot performing, 162–72, 163f, f f 167 , 169 , 24817f, 170–71 landmarks, 17–18, language, action bound to, 190– 96, 192f language-ready brains, 66– 67, 191 latent learning, 161 lateral intraparietal area (LIP), 46 Lateralized Readiness Potential, 70 learnable neurorobots, 141 learning, 259– 61. See also consolidation; deep learning; dynamic learning; er ror backpropagation scheme; imitation; predictive learning bound, 191–96, 192f, 193f, 195f as end-to-end, 217 Hebbian, 190– 91 of imitative actions, 221–22, 221f as latent, 161 offline processes, 197–98 in RN NPB, 177–82, 178f, 181f as statis tical, 221–22, 221f lesion, 224, 225f Li, W., 48 Libet, Benjamin, 69 –71, 70f, 218, 219, 220, 223, 230, 235, 240, 249, 263 like me mechanism, 101, 132, 183, 187, 190 limbs, 33, 62–63, 73–74 limit cycle attractors, 84, 85, 85f, 92–93 locomotion evolution with, 126–28, 128f
299
in MT RNN, 213 periodicity of, 166– 68 limit torus, 84, 85 f linguist ic competency, 66– 67, 190–91 LIP. See lateral intraparietal area local attractors, 84, 85, 85f localist scheme, 196– 200– 201, 200 f 97, local representation framework, 177 locomotion, limit attractor evolution, 126–28, 128f. See also walking locomotive controller, 127–28 logistic maps, 85– 89, 86f, 88f, 89f, 90, 108 longitudinal intentionality, 28 long-term and short- term memory recurrent neural network (RNN) model, 216–17 look-ahead prediction, Yamabico, 154–57, 155f, 157f Lu, X., 52–53 Luria, Alexander, 202, 216 Lyapunov exponent, 224 M1. See primary motor cortex macaque monkeys, 45f, 46 Mach, Ernst, 22– 23, 23f man, 31, 33–34, 61 manipulation, 63– 64 imitation, 221–22, 221f of QRIO, 209– 15, 209f, 212f, 214f symbol, 145– 48, 147f tutored sequences, 227–30, 228f, 229f of visual objects, 56– 57 Markov chains, 226– 27 Massachusetts Ins titute of Technology (MI T), 103 Matari ć, M., 108
300
Index
matching, 188 Matsuno, K., 252 Maturana, H., 117, 132 May, Robert, 85. See also logistic maps meanings, 195– 96 medial superior temporal area (MST), 45– 46 medial temporal lobe, 246 melody, 26–28 Meltzoff, A., 101, 183, 187 memory cells, 217 mental rehearsal, 164– 69, 167f, 169f mental simulation, 154, 156 mental states, imitating others by reading, 182–90, 184f, 185f, 189f Merleau-Ponty, Maurice, 21, 25, 32–37, 36f, 42, 61–64, 78, 144, 237, 244.See also embodiment; Schneider middle temporal area (M T), 45 middle way, 254–55 Miller, J., 70 Milner, David, 56 miming, 57 mind/ body dualism. See Cartesian dualism mind-reading, 68 minds, 3– 8. See also cognitive minds; consciousness; embodied
minimal self, 169– 72 Minsky, Marvin, 29 mirror box, 63 mirror neurons, 55– 56, 261. See also recurrent neural network with parametric biases dynamic neural network model f for, 176–79, 178 evidence for, 64 –67, 65f grasping, 64– 66, 65 f holding, 65 implementation, 67–68 intention coded by, 68 IPL, 65– 66 model, 177–79, 191–96, 192f, 193f, 195f of monkeys, 76 overview, 64– 68, 65 f in parietal cortex, 76, 177 tearing, 65 mirror sys tems, in humans, 65, 66 mismatches, 60 –61, 201, 202 MIT. See Massachusetts Ins titute of Technology mixed pattern generator, 128, 128f mobile robots, 16–18, 17f. See also Yamabico example, 5– 6, 16–18, 17f landmark-based navigation performed by, 162–72, 163f, 248 in off ice environment,
cognition; subjective mind continuity of, 100 deep, 259 embodiment of, 32–37, 36f, 42, 254 as extended, 246 overview, 262–67 theory of, 67 minimal cognition, 126
16–18, 17f problem, 16–18, 17f with vision, 162–72, 163f, 167f, 169f, 173, 193–96, 193f models, 57 modularity, 44– 49, 45f, 46f, 47f momentary self, 170, 171, 173, 264 monkey–banana problem, 14–15, 14t
Index
301
monkeys, 45f, 46, 48, 50, 51–52, 52f, 53–54, 57 inferior parietal cortex of, 183 IPL of, 65– 66 mirror neurons of, 76 motor cortex of, 208 motor neurons of, 183
multiple spatio-temporal neural network (MSTNN), 217, 218f multiple-timescale recurrent neural networks (M TRN Ns), 252, 257, 265 action sequences generated by, 227–30, 228f, 229f
parietal cortex of, 61 – 62, 62 f, 76 PMC of, 208 PMv controlling, 64– 65, 65f presupplementary motor area, 75–76 primitive movements of, 75–76 Moore, M., 101 moral virt ue, 261 Mormann, F., 246 mortality, 32 motifs, 72 motor cortex, 208, 222–23 motor imagery, 59, 206, 211, 222–24, 223f motor neurons, of monkeys, 183 motor programs, 208 –15, 209f, 212f, 214f motor schemata theory, 9–10, 175–76 movements discrete, 180, 180 f parietal cortex, 73– 74 patterns, 180– 82, 181f, 187–90, 189f, 213–15, 214f
behavior primitives, 204– 6 207–8, bottom-up error regression, 207f, 215 brain science correspondence, 206– 8, 207f chunks, 204– 6, 222–23 compositionality, 217–18, 218f experiment, 208– 15, 209f, 212f, 214f, 230– 35, 233f, 234f free will in, 220– 22, 221f limit-cycle attractors in, 213 motor imagery generated by, 206 overview, 203 –8, 203f, 207f, 216–18, 218f perceptual sequences, 204– 6 recognition performed by, 206 RNNPB as analogous to, 229– 30 top- down forward predict ion, 215 top-down pathway, 207–8, 207f tutoring, 237– 39, 238f Mu-ming Poo, 124 Murata, A., 233f Mushiake, H., 53– 54 mutual imitation game,
PMC, 73 MST. See medial superior temporal area MSTNN. See multiple spatiotemporal neural network MT. See middle temporal area MTRNNs. See multiple-timescale recurrent neural networks Mulliken, G. H., 57
187–90, 189f Nadel, Jacqueline, 101–2, 102f, 131, 188 Namikawa, J., 221f, 223 f, 225f navigation, 251. See also landmarkbased navigation; mobile robot dynamical struct ure in, 132–36, 133f, 135f
302
Index
navigation (Cont.) internal contextual dynamic struct ures in, 132–36, 133f, 135f problem, 107–9, 107f self-organization in, 132–36, 133f, 135f Yamabico 132–36, experiments, 153–62 Neo-Gibsonian approaches, 95– 97, 96f, 144, 263 neonates, 101 neural activation sequences, 222–24, 223f neural activation state, 208 neural circuits, 117, 132–36, 133f, 135f neural correlates, 76– 77, 78–79 neural network models, 255–56. See also feed-forward network model overview, 112–25, 113f, 116f, 119f, 122f, 123f types of, 112–25, 113f, 116f, 119f, 122f, 123f neurodynamic models, subjective views in, 143– 48, 145f, 147f neurodynamic structure, 157–59, 158f neurodynamics with ti mescales, 213–15, 214f neurodynamic system, 1 45– 48, 147f neurons, 46.See also mirror neurons; motor neurons; neural network models bimodal, 53– 54, 56– 57, 208 collective, 63, 72–73, 73f hard problem, 75 as motifs, 72 PMC, 72–73, 73f postsy naptic, 124 presynaptic, 124
as spiking, 109– 10, 255– 56 V1, 48 neuro-phenomenologicalrobotics, 256– 57 neurophenomenology program, 254 neurorobotics from dynamical systems f f perspective, 125– 36, 126 , 128 , 129f, 130f, 131f, 133f, 135f model, 257–59 neuroscience. See brain science Newell, Allen, 13, 15. See also General Problem Solver newness, 248– 49 Nishida, Kitaro, 21, 22–23, 25 Nishimoto, R., 209f, 212f Nolfi, S., 130– 31, 200–201, 200f nonlinear dynamical systems, structural stability of , 90 – 93, 91f, 92f. See also logistic maps nonlinear dynamics, 83– 93 nonlinear mapping, tangency in, 89, 89 f nouvelle ar tificial intelligence (nouvelle AI), 106 novel action sequences, 227–30, 228f, 229f nowness, 27, 171–72, 186–87, 235 objectification, 26– 29 objective science, subjective experience and, 251–55, 267 objective time, 27–28, 38– 39 objective world, 266 –67 phenomenology, 7, 23–42, 24f, 36f, 40f subjective mind as tied to, 49, 148–50, 149f subjective mind's distinction from, 7, 23– 42, 24f, 36f, 40 f subjectivity as mirror of, 172
Index
303
objectivity, 250, 254 –55 objects. See also manipulation; tools; visual objects as attractive, 98– 100, 99f chimps and, 10– 11 complex, 46f counting of, 10–12 f features, 46 infants using, 101–2, 102f, 143, 211 perception of, 33– 37, 36f shaking, 1 44– 45, 145f skilled behaviors for man ipulating, 57–61, 59f subject as separated from, 22– 23 subject iterative exchanges, 36 subject's uni fied existence with, 25, 36, 244, 248 as three- dimensional, 35– 36, 36f as two- dimensional, 35– 36, 36f offline learni ng processes, 197–98 offline look-ahead prediction. See look-ahead prediction one-step prediction, 154– 55, 155f, 232 online prediction, 153– 54 open-loop mode, 178 operating system (OS), 79 optical constancy, 95f optical flow, 94, 95f OS. See operating system other robot, 231–35, 233f, 234 f
activations, 191–93, 192f prediction error, 198 self-organization, 192 vectors, 183– 87, 185f, 191–93, 192f, 194– 96, 195f, 197, 201–2, 201f, 204 parietal cortex, 55, 78, 237, 266. See also action precuneus intention meeting of, 56– 61, 59f bimodal neurons in, 208 cells, 57 damage to, 57 as information hub, 208 intention involvement of, 76 intermediate dynamics, 207– 8 mirror neurons in, 76, 177 of monkeys, 61–62, 62f, 76 movements, 73–74 overview, 56– 61, 59f perceptual outcome meeting of, 56– 61, 59f perceptual structu res in, 144 predictive model in, 59f, 68 stimulation of, 73– 74 visual objects involvement of, 56–57 parrots, 11 past, 235 pastness, 27 PB. See parametric bias PCA. See principal component
outfielder, 94– 95 overregularization, 98 Oztop, E., 58
analysis PDP Research Group. See Parallel Distr ibuted Processing Research Group perception. See also active perception; what pathway; where pathway action changing reality of, 60– 61 action generation role of, 55
palpation, 34– 35, 144 Parallel Distr ibuted Processing (PDP) Research Group, 196 parametric bias (PB), 177–82, 178f, 181f, 215
304
Index
perception (Cont.) cogito as separate from, 25 experience as dependent on, 23– 42, 24f, 36f, 40f intention altered by reality of, 60– 61 of objects, 33– 37, 36f f outcome, 56– 61, 59 parietal cortex meeting of, 56– 61, 59f of square, 24– 25, 24f of time, 26– 29, 176, 186– 87, 248–49 perception-to-action mapping, 144 perception-to-motor cycle, 106, 107 perceptual constancy, 95 perceptual flows, 258 perceptual sequences, 177–79, 178f, 203f, 204– 6 perceptual structures, in parietal cortex, 144 perchings, 38, 226 periodicity, of limit cycle attractors, 166– 68 perseverative reaching, 98– 100, 99 f PFC. See prefrontal cortex Pfeifer, R., 128–30, 129f, 130f phantom limbs, 33, 62–63 phase transitions, 95– 97, 96f, 144 phenomenological reduction, 21 phenomenology, 20
subjective mind, 7, 23–42, 24f, 36f, 40f time perception, 26– 29, 176, 186– 87, 248–49 Piaget, Jean, 98– 101, 99f, 102, 260 Pick, Anne, 55, 58, 143 pilots, 94 See PMC. premotor cortex area PMv. See ventral premotor Poincaré section, 90, 91f polarity, 25 poles, 36 Pollack, J., 159 postdicti on, 230 –39, 233f, 234 f, 236f, 238 f postsy naptic neurons, 12 4 posture, 59– 60, 59 f poverty of stimulus problem, 193, 260 precuneus, 71–73, 73f prediction. See also one-step prediction errors, 166–72, 167f, 169f, 192–93, 198, 206, 207–8, 231–32, 236, 240, 257–58 as offl ine, 154 as online, 153– 54 RNNs as responsible for, 186 of sensation, 153– 57, 153f, 155f, 157f top-down, 60– 61, 164– 65, 197–98
being-in-the-world, 29– 32 direct exper ience in, 22–23, 23f embodiment of mind, 32–37, 36f objectification, 26– 29 objective world, 7, 23–42, 24f, 36f, 40f overview, 21–42, 23f, 24f, 36f, 40 f, 247–51 subjective experienc es, 26 –29
Yamabico, 153–57, 153f, 155f, 157f predictive coding, 48, 191–96, 192f, 193f, 195f predictive dynamics, selfconsciousness and, 161–72, 163f, 167f, 169f predictive learning from actional consequences, 151–73
Index
305
about world, 151–73, 153f, 155f, 157f, 158f, 160f, 163f, 167f, 169f predictive model, 57–64, 59f, 62f, 68 preempirical time, 2 6–27 prefrontal cortex ( PFC), 144, 206–7, 207f, 225–26, 266. See also frontopolar prefrontal cortex part of premotor cortex (PMC), 49– 54, 50 f, 52f, 77–78 of monkey, 208 movements, 73 neurons, 72–73, 73f role of, 76 stimulations of, 73 present, 31–32 presentness, 27 presupplementary motor area, d irect stimulation of, 74–76 presynaptic neurons, 124 pretend play, 101 preverbal infants, 101–2, 102f primary motor cortex (M1) , 49– 53, 50 f, 52f, 54, 60, 77–78 faster dynamics at, 206– 7, 207f SMA and, 206– 7, 207f primary visual cortex (V1), 44– 45, 48 primitive actions, stochastic transitions between, 221–22, 221f
problem of interactionism, 16, 243–47 proprioception, 59– 60, 59 f, 179, 183–87, 184f, 185f protention, 26– 27, 61, 186, 198 protosigns, 66 – 67 Pulvermuller, F., 191
primitive movements, 51–52, 52f, 53, 75–76 principal component analysis (PCA), 211 Principles of Psychology (James), 37–39 private states of consciousness, 38– 39 probabilist ic processes, 226– 27
recognition action's circular causality with, 149 bottom-up, 60– 61, 197–98, 266 in brain, 55– 68, 59f, 62f, 65f of landmarks, 170–71 MTRN Ns performin g, 206 of perceptual sequences, 206 reconstruction, 22
pure26– experience, 28, 39 22– 23, Quest for cuRIOsity (QRIO), 183– 90, 184f, 189f complex actions, 209 –15, 209f, 212f, 214f developmental tra ining, 209 –15, 209 f, 212f, 214f fast dynamics, 210 intermediate dynamics, 210 manipulation of, 209 –15, 209f, 212f, 214f slow dynamics, 210 rake, 62 Ramachandran, V., 63 Rao, Rajesh, 48, 60 rapid eye movement (REM) sleep phase, 165 rats, hippocampus of, 72–73, 73f reactive behaviors, 141–42 Readiness Potential (RP), 69–71, 70f recognition, 22. See also visual
306
Index
recurrent neural networks ( RNNs), 111–12, 150, 245. See also cascaded recurrent neural network; Jordan- type recurrent neural network as forward dynamics model, 152 as gated, 222 f f models, 116–20, 116 , 119 , 124, 202, 216–17, 264 prediction responsibility of, 186 Yamabico, 153–61, 153f, 155f, 157f, 160f recurrent neural network with parametric biases (RNNPB), 177–79, 205. See also parametric bias characterist ics of, 179– 82, 181f distr ibuted representation characterist ics in, 197 frame problem avoided by, 177–78 learning in, 177–82, 178f, 181f models, 191–98, 192f, 193f, 195f, 201–2, 201f, 204, 229– 30, 248– 49, 264– 65 MTRNN as analogous to, 229– 30 overview, 176–79, 178f segmentation of, 186– 87 system flow of, 178f recursion, 9– 13, 12f reflective pattern generator, 127 reflective selves, 216 refrigerator, 5, 19, 41–42
retrograde axonal signal, 124 retrograde axonal signaling mechanism, 124, 207–8 Rizzolatti, G., 64– 66, 65 f, 76, 182–83 RNNPB. See recurrent neural network with parametric biases See RNNs. recurrent neural networks robotics, 5– 6, 261. See also behavior-based robotics; neurorobotics robots. See also arm robot; behaviorbased robotics; mobile robots; other robot; self robot Cartesian dualism freedom of, 1 49 as humanoid, 221–22, 221f, 227– 30, 228f, 229f, 257 Khepera, 129– 30, 129f, 130f navigation problem, 107–9, 107f reflective selves of, 216 as self-narrative, 206, 216, 249 with subjective views, 141–43 walking of, 126– 28, 128f Rössler attractor, 90, 158 Rössler system, 90, 91f rostral- caudal gradient, 2 06 –7, 207f RP. See Readiness Potential rules, 11, 12f, 14–15, 14t Rumelhart, D., 113. See also error back-propagation scheme
refusal of deficiency, 33 rehearsal, 164– 69, 167f, 169f REM phase. See rapid eye movement sleep phase representation, 25, 28, 106, 108, 145–48, 147f response facilitation with understanding meaning, 183 retention, 26– 27, 186, 198
Sakata, Hideo, 57 sand pile behavior, 171 scaffolding, 260 Scheier, C., 128–30, 129f, 130f schizophrenia, 256– 57 Schmidhuber, J., 216–17 Schneider, 33– 34, 56 see-ers,34– 35 segmentation, 176, 186–87
Index self-consciousness, 161–72, 163f, 167f, 169–70, 169f selfhood, 39 self-organization, 7, 98, 130, 202, 244–45 in bound learning process, 194–96, 195f
307
articu lating, 175–98, 185f sensory- motor sequences model, 191–96 sentences, 190 Elman net generating, 118–20, 119f model, 191–96, 192f
dynamical systems approach applying, 7 of functional hierarchy , 203– 8, 203f, 207f multiple timescales, 203– 8, 203f, 207f in navigation, 132–36, 133f, 135f PB, 192 self-organized criticality (SOC), 171–72, 188–90, 264, 267 self robot, 231–35, 233f, 234 f selves, 248, 264 distu rbance of, 256– 57 as minimal, 169– 72 momentary, 170, 171, 173, 264 range of, 33 as reflective, 216 semantically combinatorial language of thought, 145 sensationalism, 24 sensations, prediction of, 153–57, 153f, 155f, 157f. See also synesthesia
f recursive structure of, 11, 12 sequence patterns, 177–79, 178f. See also recur rent neural network with parametric biases sequential movements, 53–54 shaking, 1 44– 45, 145f Shima, K., 51–52, 52f, 53, 76, 206 short-term memory (STM), 247 Siegelmann, Hava, 112, 244–45 Simon, Herbert, 13. See also General Problem Solver simulation theory, 67, 68 single-unit recording, 46 sinusoidal function, 92 Sirigu, A., 58, 59, 76 skilled behaviors, 50– 51, 57–61, 59f slow dynamics, 203– 4, 203f, 205, 206. See also intention units MTRN N, 223f, 224 at PFC, 206– 7, 207f QRIO, 210 SMA. See supplementary motor area Smith, Linda, 97– 98, 211 Soar, 14, 15, 246–47 SOC. See self-organized criticality
sensory a liasing problem, 1 34 sensory cortices, 63– 64 sensory- guided actions, in PMC, 53–54 sensory- motor coordination, 128–32, 129f, 130f, 131f sensory- motor flow action generation mirrored by, 175–98, 178f
Soon, C., 70– 71, 74, 218, 219, 220, 223, 230, 240 speech recognition system, 258 Spencer-Brown, G., 18–19 spiking neurons, 109– 10, 255–56. See also neural network models Spivey, M., 100, 146 spoken grammar, 12–13 spontaneity, 226 –27
of 133 dynamical re, 132–36, f, 135f, structu 245
308
Index
spontaneous behaviors, 219–30 spontaneous generation of intention, 69 –75, 70f, 73f, 230– 40, 236 f, 238 f overview, 71–72 staged development, 260 statist ical learning, of imitative f actions, 221–22, 221 steady phase, 168, 169, 170 STM. See short-term memory stochastic trans itions between primitive actions, 221– 22, 221f streams of consciousness, 170 characterist ics of, 37–39 definition of, 37 flights, 38, 226 free will and, 37– 41, 40f, 42 images in, 182 overview, 37–41, 40f perchings, 38, 226 states, 37– 39 stretching and folding, 87, 88f structural stability , 90 – 93, 91f, 92f struct uring processes of whole (Gestalt), 34 subject object as separated from, 22– 23 object iterative exchanges, 36 object's unified existence with, 25, 36, 244, 248 subjective experienc es, 26 –29, 251–55, 267
subjective views, 141–48, 145f, 147f subjectivity, 141, 172, 250, 254– 55, 266– 67 subrecursive functions, 13 substantial parts, 226 subsumption architecture, 107–9, 107f f Sugita, Yuuya, 191–96, 192 , 193f, 195f Sun, Ron, 247 superior parietal cortex. See precuneus supplementary motor area (SMA), 49– 54, 50f, 52f, 61, 63– 64, 77–78 EEG activity, 70– 71 M1 and, 206– 7, 207f surpri se, 172n2 suspension of disbelief ( epoché ), 25 symbol grounding problem, 15–18, 17f, 159– 61, 160f, 243 symbolic dynamics, 88– 89 symbolic processes, 87– 88 symbols, 19– 20, 108, 145–48, 147f, 245– 47 symbol systems, 9– 13, 12f, 18–20 synchronization, 188 synchrony, 102, 102f, 131–32 synesthesia, 34, 63 synthesis, emergence through, 83 synthetic modeling approach, 79, 82. See also dynamical systems
subjective mind actions influencing, 49 objective world as tied to, 49, 148–50, 149f objective world's distinction from, 7, 23– 42, 24f, 36f, 40 f phenomenology, 7, 23–42, 24f, 36f, 40f subjective sense of time, 32
approach; embodiment synthetic neurorobotics studies, 6, 7, 263– 64 synthetic robotics approach, 7, 267 synthetic robotics experi ments, 150, 218 tactile palpation, 34 Tanaka, Keiji, 46
Index
309
tangency, 89, 89f, 171 Tani, Tohru, 17f, 28, 35, 38 Tanji, J., 51–52, 52f, 53– 54, 76, 206 TE. See inferior temporal area tearing neurons, 65 temporality, 32 temporal patterns, 182
training, actions influenced by, 215. See also developmen tal training transient parts, 226 transition rules, 14– 15, 14t transition sequences, 222 transitive actions, 65 transversal intentionality, 28
temporoparietal (TPJ), 61 junct ion TEO. See inferior temporal area Tettamanti, M., 67 that which appears, 24– 25, 24f Thelen, Ester, 97–98, 99, 211 think ing, thought segmentation of, 146 thoughts, 71–72 chaos generating, 108 experiments, 103– 5, 104f, 105f semantically combinatorial language of, 145 thinking segmented into, 146 three- dimensional objects, 35–36, 36f time, subjective sense of, 32. See also objective time time perception phenomenology, 26–29, 176, 186 –87, 248–49 tokens, 10, 13 tools, 57–61, 59f top-down forward prediction, 215 top-down pathway, 63, 164–65, 172, 207–8, 207f, 250
Trevena, 70 Tsuda, I., J., 168–69 Turing limit, 112, 245 turn ta king, 188, 190 Turvey, 95–96 tutored sequences, 227–30, 228f, 229f tutoring, 237– 39, 238f, 240– 41, 259–61 two-dimensional objects, 35– 36, 36f two-stage model, 39– 41, 40f
top-down prediction, 60– 61, 164– 65, 197–98 top-down projection, 144– 45 top-down subjective intentions, 263 top-down subjective view, 266 touched, 35 touching, 35 toy, 98–100, 99 f TPJ. See temporoparietal junction
vector flow, 91–92, 92f vehicles, 103– 5, 104f, 105f, 107, 108 –9 Vehicles: Experiments in Synthetic Psychology (Braitenburg), 103– 6 ventral intraparietal area ( VIP), 46 ventral premotor area (PMv), 64– 65, 65f VIP. See ventral intraparietal area
Uddén, J., 206 –7, 207f Ueda, Shizuteru, 23 universal grammar, 191 unsteady phase, 168, 169–70 usage-based approach, 191 U-shaped development, 98–100, 99f V1. See primary visual cortex V2, 45, 48 V4, 46, 47, 48 Van de Cruys, S., 257–58 Varela, Francisco, 27, 42, 117, 132, 187, 248, 254 vector field, 91–92, 92f
310
Index
virtual- reality mir ror box, 63 virt ue, 261 vision Merleau-Ponty on, 34– 35 mobile robot with, 162– 72, 163f, 167f, 169f, 173, 193–96, 193f, 195f visual agnosia, 56
Wernicke's area, 191 what pathway, 45, 46, 47f, 162–65, 163f where pathway, 45, 162–65, 163f will, 56. See also free will Wittgenstein, Ludwig, 18 w-judgment time, 69 –71, 70f
visual alphabets, visual co rtex, 44–47 49, 45f, 46f, 47f visual imagery, 228– 29 visual objects, 46 f, 56– 57 visual palpation, 144 visual receptive field, 62 visual recognition, 44– 54, 45f, 46f, 47f, 50f, 52f visuo- proprioceptive (VP) flow, 210 visuo- proprioceptive mapping, 131–32, 131f visuo- proprioceptive (V P) trajectories, 213–15, 214f voluntary actions, 39 voluntary sequential movements in SMA, 50– 53, 52f von Hofsten, Claes, 143 VP flow. See visuo- proprioceptive flow VP t rajectories. See visuoproprioceptive trajectories
Wolpert, 58147f, 190 words, 67,Daniel, 145–48, World War II ( WWII), 94
walking, 126– 28, 128f walking reflex, 98 water hammers, 4– 5, 6
Yamabico, 132–36, 133f, 135f, 164, 173 branching, 133, 152–60, 153f, 155f, 157f, 158f, 160f, 176 intentionality of, 161 look-ahead prediction, 1 54– 57, 155f, 157f navigation experi ments with, 153–62, 153f, 155f, 157f, 158f, 160f neurodynamic structure, 157–59, 158f prediction, 153–57, 153f, 155f, 157f RNN, 153– 61, 153f, 155f, 157f, 160f symbol grounding problem, 159–61, 160f trajectories of, 155, 156–57, 157f Yamashita, Yuuichi, 208, 256 –57. See also multiple-timescale
Werbos, Paul, 113. See also error back-propagation scheme
recurrent neural network Yen, S. C., 49