Top

Published in:

Open Access 01-12-2005 | Short paper

Dopamine, uncertainty and TD learning

Authors: Yael Niv, Michael O Duff, Peter Dayan

Published in: Behavioral and Brain Functions | Issue 1/2005

Abstract

Substantial evidence suggests that the phasic activities of dopaminergic neurons in the primate midbrain represent a temporal difference (TD) error in predictions of future reward, with increases above and decreases below baseline consequent on positive and negative prediction errors, respectively. However, dopamine cells have very low baseline activity, which implies that the representation of these two sorts of error is asymmetric. We explore the implications of this seemingly innocuous asymmetry for the interpretation of dopaminergic firing patterns in experiments with probabilistic rewards which bring about persistent prediction errors. In particular, we show that when averaging the non-stationary prediction errors across trials, a ramping in the activity of the dopamine neurons should be apparent, whose magnitude is dependent on the learning rate. This exact phenomenon was observed in a recent experiment, though being interpreted there in antipodal terms as a within-trial encoding of uncertainty.

Available only for authorised users

Ljungberg T, Apicella P, Schultz W: Responses of monkey dopamine neurons during learning of behavioral reactions. Journal Neurophysiol. 1992, 67: 145-163.

Schultz W: Predictive reward signal of dopamine neurons. Journal of Neurophysiology. 1998, 80: 1-27.http://jn.physiology.org/cgi/content/full/80/1/1PubMed

O'Doherty J, Dayan P, Friston K, Critchley H, Dolan R: Temporal difference models and reward-related learning in the human brain. Neuron. 2003, 38: 329-337. 10.1016/S0896-6273(03)00169-7.CrossRefPubMed

Seymour B, O'Doherty J, Dayan P, Koltzenburg M, Jones A, Dolan R, Friston K, Frackowiak R: Temporal difference models describe higher order learning in humans. Nature. 2004, 429: 664-667. 10.1038/nature02581.CrossRefPubMed

Montague PR, Hyman SE, Cohan JD: Computational roles for dopamine in behavioural control. Nature. 2004, 431: 760-767. 10.1038/nature03015.CrossRefPubMed

Montague PR, Dayan P, Sejnowski TJ: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. The Journal of Neuroscience. 1996, 16: 1936-1947.PubMed

Schultz W, Dayan P, Montague PR: A neural substrate of prediction and reward. Science. 1997, 275: 1593-1599. 10.1126/science.275.5306.1593.CrossRefPubMed

Sutton RS: Learning to predict by the method of temporal difference. Machine Learning. 1988, 3: 9-44.

Sutton RS, Barto AG: Reinforcement learning: An introduction. 1998, MIT Press,http://www.cs.ualberta.ca/~sutton/book/ebook/the-book.html

10.

Hollerman J, Schultz W: Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience. 1998, 1: 304-309. 10.1038/1124.CrossRefPubMed

11.

Schultz W, Apicella P, Ljungberg T: Responses of monkey dopamine neurons to reward and conditioned stimuli during succesive steps of learning a delayed response task. The Journal of Neuroscience. 1993, 13: 900-913.PubMed

12.

Tobler P, Dickinson A, Schultz W: Coding of Predicted Reward Omission by Dopamine Neurons in a Conditioned Inhibition Paradigm. Journal of Neuroscience. 2003, 23 (32): 10402-10410.PubMed

13.

Takikawa Y, Kawagoe R, Hikosaka O: A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. Journal of Neurophysiology. 2004, 92: 2520-2529. 10.1152/jn.00238.2004.CrossRefPubMed

14.

Bayer H: A role for the substantia nigra in learning and motor control. PhD thesis, New York University. 2004

15.

Fiorillo C, Tobler P, Schultz W: Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons. Science. 2003, 299 (5614): 1898-1902. 10.1126/science.1077349.CrossRefPubMed

16.

Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H: Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron. 2004, 43: 133-143. 10.1016/j.neuron.2004.06.012.CrossRefPubMed

17.

Barto A, Sutton R, Watkins C: Learning and sequntial decision making. Learning and Computational Neuroscience: Foundations of Adaptive Networks. Edited by: Gabriel M, Moore J. 1990, Cambridge, MA: MIT Press, 539-602.

18.

Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O: Dopamine neurons can represent context-dependent prediction error. Neuron. 2004, 41: 269-280. 10.1016/S0896-6273(03)00869-9.CrossRefPubMed

19.

Gallistel CR, Gibbon J: Time, rate and conditioning. Psychological Review. 2000, 107: 289-344. 10.1037//0033-295X.107.2.289.CrossRefPubMed

20.

Daw ND, Kakade S, Dayan P: Opponent interactions between serotonin and dopamine. Neural Networks. 2002, 15 (4–6): 603-616. 10.1016/S0893-6080(02)00052-7.CrossRefPubMed

21.

Suri RE, Schultz W: A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience. 1999, 91: 871-890. 10.1016/S0306-4522(98)00697-6.CrossRefPubMed

22.

Pearce JM, Hall G: A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review. 1980, 87: 532-552. 10.1037//0033-295X.87.6.532.CrossRefPubMed

23.

Dayan P, Kakade S, Montague PR: Learning and selective attention. Nature Neuroscience. 2000, 3: 1218-1223. 10.1038/81504.CrossRefPubMed

24.

Dayan P, Yu A: Expected and unexpected uncertainty: Ach and NE in the neocortex. Advances in Neural Information Processing Sysytems. Edited by: Dietterich T, Becker S, Ghahramani Z. 2002, Cambridge, MA: MIT Press, 14: 189-196.http://books.nips.ce/papers/files/nips15/NS08.pdf

25.

Daw N, Niv Y, Dayan P: Actions, Policies, Values, and the Basal Ganglia. Recent Breakthroughs in Basal Ganglia Research. Edited by: Bezard E. New York, USA: Nova Science Publishers, Inc,

26.

Wickens J, Kötter R: Cellular models of reinforcememnt. Models of Information Processing in the Basal Ganglia. Edited by: Houk JC, Davis JL, Beiser DG. 1995, MIT Press, 187-214.

Title: Dopamine, uncertainty and TD learning
Authors: Yael Niv
Michael O Duff
Peter Dayan
Publication date: 01-12-2005
Publisher: BioMed Central
Published in: Behavioral and Brain Functions / Issue 1/2005
Electronic ISSN: 1744-9081
DOI: https://doi.org/10.1186/1744-9081-1-6

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Dopamine, uncertainty and TD learning

Abstract

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Please log in to get access to this content

Other articles of this Issue 1/2005

EEG correlates of verbal and nonverbal working memory

Association study of polymorphisms in synaptic vesicle-associated genes, SYN2 and CPLX2, with schizophrenia

Moment-to-moment dynamics of ADHD behaviour

Sequence analysis of Drd2, Drd4, and Dat1 in SHR and WKY rat strains

Response inhibition deficits in externalizing child psychiatric disorders: An ERP-study with the Stop-task

Executive and motivational processes in adolescents with Attention-Deficit-Hyperactivity Disorder (ADHD)