Skip to main content
Top
Published in: Behavioral and Brain Functions 1/2005

Open Access 01-12-2005 | Short paper

Dopamine, uncertainty and TD learning

Authors: Yael Niv, Michael O Duff, Peter Dayan

Published in: Behavioral and Brain Functions | Issue 1/2005

Login to get access

Abstract

Substantial evidence suggests that the phasic activities of dopaminergic neurons in the primate midbrain represent a temporal difference (TD) error in predictions of future reward, with increases above and decreases below baseline consequent on positive and negative prediction errors, respectively. However, dopamine cells have very low baseline activity, which implies that the representation of these two sorts of error is asymmetric. We explore the implications of this seemingly innocuous asymmetry for the interpretation of dopaminergic firing patterns in experiments with probabilistic rewards which bring about persistent prediction errors. In particular, we show that when averaging the non-stationary prediction errors across trials, a ramping in the activity of the dopamine neurons should be apparent, whose magnitude is dependent on the learning rate. This exact phenomenon was observed in a recent experiment, though being interpreted there in antipodal terms as a within-trial encoding of uncertainty.
Appendix
Available only for authorised users
Literature
1.
go back to reference Ljungberg T, Apicella P, Schultz W: Responses of monkey dopamine neurons during learning of behavioral reactions. Journal Neurophysiol. 1992, 67: 145-163. Ljungberg T, Apicella P, Schultz W: Responses of monkey dopamine neurons during learning of behavioral reactions. Journal Neurophysiol. 1992, 67: 145-163.
3.
go back to reference O'Doherty J, Dayan P, Friston K, Critchley H, Dolan R: Temporal difference models and reward-related learning in the human brain. Neuron. 2003, 38: 329-337. 10.1016/S0896-6273(03)00169-7.CrossRefPubMed O'Doherty J, Dayan P, Friston K, Critchley H, Dolan R: Temporal difference models and reward-related learning in the human brain. Neuron. 2003, 38: 329-337. 10.1016/S0896-6273(03)00169-7.CrossRefPubMed
4.
go back to reference Seymour B, O'Doherty J, Dayan P, Koltzenburg M, Jones A, Dolan R, Friston K, Frackowiak R: Temporal difference models describe higher order learning in humans. Nature. 2004, 429: 664-667. 10.1038/nature02581.CrossRefPubMed Seymour B, O'Doherty J, Dayan P, Koltzenburg M, Jones A, Dolan R, Friston K, Frackowiak R: Temporal difference models describe higher order learning in humans. Nature. 2004, 429: 664-667. 10.1038/nature02581.CrossRefPubMed
5.
go back to reference Montague PR, Hyman SE, Cohan JD: Computational roles for dopamine in behavioural control. Nature. 2004, 431: 760-767. 10.1038/nature03015.CrossRefPubMed Montague PR, Hyman SE, Cohan JD: Computational roles for dopamine in behavioural control. Nature. 2004, 431: 760-767. 10.1038/nature03015.CrossRefPubMed
6.
go back to reference Montague PR, Dayan P, Sejnowski TJ: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. The Journal of Neuroscience. 1996, 16: 1936-1947.PubMed Montague PR, Dayan P, Sejnowski TJ: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. The Journal of Neuroscience. 1996, 16: 1936-1947.PubMed
7.
go back to reference Schultz W, Dayan P, Montague PR: A neural substrate of prediction and reward. Science. 1997, 275: 1593-1599. 10.1126/science.275.5306.1593.CrossRefPubMed Schultz W, Dayan P, Montague PR: A neural substrate of prediction and reward. Science. 1997, 275: 1593-1599. 10.1126/science.275.5306.1593.CrossRefPubMed
8.
go back to reference Sutton RS: Learning to predict by the method of temporal difference. Machine Learning. 1988, 3: 9-44. Sutton RS: Learning to predict by the method of temporal difference. Machine Learning. 1988, 3: 9-44.
10.
go back to reference Hollerman J, Schultz W: Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience. 1998, 1: 304-309. 10.1038/1124.CrossRefPubMed Hollerman J, Schultz W: Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience. 1998, 1: 304-309. 10.1038/1124.CrossRefPubMed
11.
go back to reference Schultz W, Apicella P, Ljungberg T: Responses of monkey dopamine neurons to reward and conditioned stimuli during succesive steps of learning a delayed response task. The Journal of Neuroscience. 1993, 13: 900-913.PubMed Schultz W, Apicella P, Ljungberg T: Responses of monkey dopamine neurons to reward and conditioned stimuli during succesive steps of learning a delayed response task. The Journal of Neuroscience. 1993, 13: 900-913.PubMed
12.
go back to reference Tobler P, Dickinson A, Schultz W: Coding of Predicted Reward Omission by Dopamine Neurons in a Conditioned Inhibition Paradigm. Journal of Neuroscience. 2003, 23 (32): 10402-10410.PubMed Tobler P, Dickinson A, Schultz W: Coding of Predicted Reward Omission by Dopamine Neurons in a Conditioned Inhibition Paradigm. Journal of Neuroscience. 2003, 23 (32): 10402-10410.PubMed
13.
go back to reference Takikawa Y, Kawagoe R, Hikosaka O: A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. Journal of Neurophysiology. 2004, 92: 2520-2529. 10.1152/jn.00238.2004.CrossRefPubMed Takikawa Y, Kawagoe R, Hikosaka O: A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. Journal of Neurophysiology. 2004, 92: 2520-2529. 10.1152/jn.00238.2004.CrossRefPubMed
14.
go back to reference Bayer H: A role for the substantia nigra in learning and motor control. PhD thesis, New York University. 2004 Bayer H: A role for the substantia nigra in learning and motor control. PhD thesis, New York University. 2004
15.
go back to reference Fiorillo C, Tobler P, Schultz W: Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons. Science. 2003, 299 (5614): 1898-1902. 10.1126/science.1077349.CrossRefPubMed Fiorillo C, Tobler P, Schultz W: Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons. Science. 2003, 299 (5614): 1898-1902. 10.1126/science.1077349.CrossRefPubMed
16.
go back to reference Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H: Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron. 2004, 43: 133-143. 10.1016/j.neuron.2004.06.012.CrossRefPubMed Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H: Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron. 2004, 43: 133-143. 10.1016/j.neuron.2004.06.012.CrossRefPubMed
17.
go back to reference Barto A, Sutton R, Watkins C: Learning and sequntial decision making. Learning and Computational Neuroscience: Foundations of Adaptive Networks. Edited by: Gabriel M, Moore J. 1990, Cambridge, MA: MIT Press, 539-602. Barto A, Sutton R, Watkins C: Learning and sequntial decision making. Learning and Computational Neuroscience: Foundations of Adaptive Networks. Edited by: Gabriel M, Moore J. 1990, Cambridge, MA: MIT Press, 539-602.
18.
go back to reference Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O: Dopamine neurons can represent context-dependent prediction error. Neuron. 2004, 41: 269-280. 10.1016/S0896-6273(03)00869-9.CrossRefPubMed Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O: Dopamine neurons can represent context-dependent prediction error. Neuron. 2004, 41: 269-280. 10.1016/S0896-6273(03)00869-9.CrossRefPubMed
19.
go back to reference Gallistel CR, Gibbon J: Time, rate and conditioning. Psychological Review. 2000, 107: 289-344. 10.1037//0033-295X.107.2.289.CrossRefPubMed Gallistel CR, Gibbon J: Time, rate and conditioning. Psychological Review. 2000, 107: 289-344. 10.1037//0033-295X.107.2.289.CrossRefPubMed
20.
go back to reference Daw ND, Kakade S, Dayan P: Opponent interactions between serotonin and dopamine. Neural Networks. 2002, 15 (4–6): 603-616. 10.1016/S0893-6080(02)00052-7.CrossRefPubMed Daw ND, Kakade S, Dayan P: Opponent interactions between serotonin and dopamine. Neural Networks. 2002, 15 (4–6): 603-616. 10.1016/S0893-6080(02)00052-7.CrossRefPubMed
21.
go back to reference Suri RE, Schultz W: A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience. 1999, 91: 871-890. 10.1016/S0306-4522(98)00697-6.CrossRefPubMed Suri RE, Schultz W: A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience. 1999, 91: 871-890. 10.1016/S0306-4522(98)00697-6.CrossRefPubMed
22.
go back to reference Pearce JM, Hall G: A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review. 1980, 87: 532-552. 10.1037//0033-295X.87.6.532.CrossRefPubMed Pearce JM, Hall G: A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review. 1980, 87: 532-552. 10.1037//0033-295X.87.6.532.CrossRefPubMed
23.
go back to reference Dayan P, Kakade S, Montague PR: Learning and selective attention. Nature Neuroscience. 2000, 3: 1218-1223. 10.1038/81504.CrossRefPubMed Dayan P, Kakade S, Montague PR: Learning and selective attention. Nature Neuroscience. 2000, 3: 1218-1223. 10.1038/81504.CrossRefPubMed
25.
go back to reference Daw N, Niv Y, Dayan P: Actions, Policies, Values, and the Basal Ganglia. Recent Breakthroughs in Basal Ganglia Research. Edited by: Bezard E. New York, USA: Nova Science Publishers, Inc, Daw N, Niv Y, Dayan P: Actions, Policies, Values, and the Basal Ganglia. Recent Breakthroughs in Basal Ganglia Research. Edited by: Bezard E. New York, USA: Nova Science Publishers, Inc,
26.
go back to reference Wickens J, Kötter R: Cellular models of reinforcememnt. Models of Information Processing in the Basal Ganglia. Edited by: Houk JC, Davis JL, Beiser DG. 1995, MIT Press, 187-214. Wickens J, Kötter R: Cellular models of reinforcememnt. Models of Information Processing in the Basal Ganglia. Edited by: Houk JC, Davis JL, Beiser DG. 1995, MIT Press, 187-214.
Metadata
Title
Dopamine, uncertainty and TD learning
Authors
Yael Niv
Michael O Duff
Peter Dayan
Publication date
01-12-2005
Publisher
BioMed Central
Published in
Behavioral and Brain Functions / Issue 1/2005
Electronic ISSN: 1744-9081
DOI
https://doi.org/10.1186/1744-9081-1-6

Other articles of this Issue 1/2005

Behavioral and Brain Functions 1/2005 Go to the issue