Abstract: The fields of neuroscience and artificial intelligence (AI) have a long and intertwined history. In more recent times, however, communication and collaboration between the two fields has become less commonplace. In this article, we argue that better understanding biological brains could play a vital role in building intelligent machines. We survey historical interactions between the AI and neuroscience fields and emphasize current advances in AI that have been inspired by the study of neural computation in humans and other animals. We conclude by highlighting shared themes that may be key for advancing future research in both fields. The fields of neuroscience and artificial intelligence (AI) have a long and intertwined history. In more recent times, however, communication and collaboration between the two fields has become less commonplace. In this article, we argue that better understanding biological brains could play a vital role in building intelligent machines. We survey historical interactions between the AI and neuroscience fields and emphasize current advances in AI that have been inspired by the study of neural computation in humans and other animals. We conclude by highlighting shared themes that may be key for advancing future research in both fields. In recent years, rapid progress has been made in the related fields of neuroscience and artificial intelligence (AI). At the dawn of the computer age, work on AI was inextricably intertwined with neuroscience and psychology, and many of the early pioneers straddled both fields, with collaborations between these disciplines proving highly productive (Churchland and Sejnowski, 1988Churchland P.S. Sejnowski T.J. Perspectives on cognitive neuroscience.Science. 1988; 242: 741-745Crossref PubMed Scopus (240) Google Scholar, Hebb, 1949Hebb D.O. The Organization of Behavior. John Wiley & Sons, 1949Google Scholar, Hinton et al., 1986Hinton G.E. McClelland J.L. Rumelhart D.E. Distributed Representations.in: Explorations in the Microstructure of Cognition. MIT Press, 1986: 77-109Google Scholar, Hopfield, 1982Hopfield J.J. Neural networks and physical systems with emergent collective computational abilities.Proc. Natl. Acad. Sci. USA. 1982; 79: 2554-2558Crossref PubMed Google Scholar, McCulloch and Pitts, 1943McCulloch W. Pitts W. A logical calculus of ideas immanent in nervous activity.Bull. Math. Biophys. 1943; 5: 115-133Crossref Scopus (5172) Google Scholar, Turing, 1950Turing A. Computing machinery and intelligence.Mind. 1950; 236: 433-460Crossref Google Scholar). However, more recently, the interaction has become much less commonplace, as both subjects have grown enormously in complexity and disciplinary boundaries have solidified. In this review, we argue for the critical and ongoing importance of neuroscience in generating ideas that will accelerate and guide AI research (see Hassabis commentary in Brooks et al., 2012Brooks R. Hassabis D. Bray D. Shashua A. Turing centenary: is the brain a good model for machine intelligence?.Nature. 2012; 482: 462-463Crossref PubMed Scopus (1) Google Scholar). We begin with the premise that building human-level general AI (or “Turing-powerful” intelligent systems; Turing, 1936Turing A.M. On computable numbers, with an application to the Entscheidungs problem.Proc. Lond. Math. Soc. 1936; 2: 230-265Google Scholar) is a daunting task, because the search space of possible solutions is vast and likely only very sparsely populated. We argue that this therefore underscores the utility of scrutinizing the inner workings of the human brain— the only existing proof that such an intelligence is even possible. Studying animal cognition and its neural implementation also has a vital role to play, as it can provide a window into various important aspects of higher-level general intelligence. The benefits to developing AI of closely examining biological intelligence are two-fold. First, neuroscience provides a rich source of inspiration for new types of algorithms and architectures, independent of and complementary to the mathematical and logic-based methods and ideas that have largely dominated traditional approaches to AI. For example, were a new facet of biological computation found to be critical to supporting a cognitive function, then we would consider it an excellent candidate for incorporation into artificial systems. Second, neuroscience can provide validation of AI techniques that already exist. If a known algorithm is subsequently found to be implemented in the brain, then that is strong support for its plausibility as an integral component of an overall general intelligence system. Such clues can be critical to a long-term research program when determining where to allocate resources most productively. For example, if an algorithm is not quite attaining the level of performance required or expected, but we observe it is core to the functioning of the brain, then we can surmise that redoubled engineering efforts geared to making it work in artificial systems are likely to pay off. Of course from a practical standpoint of building an AI system, we need not slavishly enforce adherence to biological plausibility. From an engineering perspective, what works is ultimately all that matters. For our purposes then, biological plausibility is a guide, not a strict requirement. What we are interested in is a systems neuroscience-level understanding of the brain, namely the algorithms, architectures, functions, and representations it utilizes. This roughly corresponds to the top two levels of the three levels of analysis that Marr famously stated are required to understand any complex biological system (Marr and Poggio, 1976Marr D. Poggio T. From understanding computation to understanding neural circuitry.A.I. Memo. 1976; 357: 1-22Google Scholar): the goals of the system (the computational level) and the process and computations that realize this goal (the algorithmic level). The precise mechanisms by which this is physically realized in a biological substrate are less relevant here (the implementation level). Note this is where our approach to neuroscience-inspired AI differs from other initiatives, such as the Blue Brain Project (Markram, 2006Markram H. The blue brain project.Nat. Rev. Neurosci. 2006; 7: 153-160Crossref PubMed Scopus (620) Google Scholar) or the field of neuromorphic computing systems (Esser et al., 2016Esser S.K. Merolla P.A. Arthur J.V. Cassidy A.S. Appuswamy R. Andreopoulos A. Berg D.J. McKinstry J.L. Melano T. Barch D.R. et al.Convolutional networks for fast, energy-efficient neuromorphic computing.Proc. Natl. Acad. Sci. USA. 2016; 113: 11441-11446Crossref PubMed Scopus (77) Google Scholar), which attempt to closely mimic or directly reverse engineer the specifics of neural circuits (albeit with different goals in mind). By focusing on the computational and algorithmic levels, we gain transferrable insights into general mechanisms of brain function, while leaving room to accommodate the distinctive opportunities and challenges that arise when building intelligent machines in silico. The following sections unpack these points by considering the past, present, and future of the AI-neuroscience interface. Before beginning, we offer a clarification. Throughout this article, we employ the terms “neuroscience” and “AI.” We use these terms in the widest possible sense. When we say neuroscience, we mean to include all fields that are involved with the study of the brain, the behaviors that it generates, and the mechanisms by which it does so, including cognitive neuroscience, systems neuroscience and psychology. When we say AI, we mean work in machine learning, statistics, and AI research that aims to build intelligent machines (Legg and Hutter, 2007Legg S. Hutter M. A collection of definitions of intelligence.in: Goertzel B. Wang P. Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms—Proceedings of the AGI Workshop. Amsterdam IOS, 2007: 17-24Google Scholar). We begin by considering the origins of two fields that are pivotal for current AI research, deep learning and reinforcement learning, both of which took root in ideas from neuroscience. We then turn to the current state of play in AI research, noting many cases where inspiration has been drawn (sometimes without explicit acknowledgment) from concepts and findings in neuroscience. In this section, we particularly emphasize instances where we have combined deep learning with other approaches from across machine learning, such as reinforcement learning (Mnih et al., 2015Mnih V. Kavukcuoglu K. Silver D. Rusu A.A. Veness J. Bellemare M.G. Graves A. Riedmiller M. Fidjeland A.K. Ostrovski G. et al.Human-level control through deep reinforcement learning.Nature. 2015; 518: 529-533Crossref PubMed Scopus (1268) Google Scholar), Monte Carlo tree search (Silver et al., 2016Silver D. Huang A. Maddison C.J. Guez A. Sifre L. van den Driessche G. Schrittwieser J. Antonoglou I. Panneershelvam V. Lanctot M. et al.Mastering the game of Go with deep neural networks and tree search.Nature. 2016; 529: 484-489Crossref PubMed Scopus (1344) Google Scholar), or techniques involving an external content-addressable memory (Graves et al., 2016Graves A. Wayne G. Reynolds M. Harley T. Danihelka I. Grabska-Barwińska A. Colmenarejo S.G. Grefenstette E. Ramalho T. Agapiou J. et al.Hybrid computing using a neural network with dynamic external memory.Nature. 2016; 538: 471-476Crossref PubMed Scopus (107) Google Scholar). Next, we consider the potential for neuroscience to support future AI research, looking at both the most likely research challenges and some emerging neuroscience-inspired AI techniques. While our main focus will be on the potential for neuroscience to benefit AI, our final section will briefly consider ways in which AI may be helpful to neuroscience and the broader potential for synergistic interactions between these two fields. As detailed in a number of recent reviews, AI has been revolutionized over the past few years by dramatic advances in neural network, or “deep learning,” methods (LeCun et al., 2015LeCun Y. Bengio Y. Hinton G. Deep learning.Nature. 2015; 521: 436-444Crossref PubMed Scopus (5375) Google Scholar, Schmidhuber, 2014Schmidhuber J. Deep learning in neural networks: an overview.arXiv. 2014; : 14047828Google Scholar). As the moniker “neural network” might suggest, the origins of these AI methods lie directly in neuroscience. In the 1940s, investigations of neural computation began with the construction of artificial neural networks that could compute logical functions (McCulloch and Pitts, 1943McCulloch W. Pitts W. A logical calculus of ideas immanent in nervous activity.Bull. Math. Biophys. 1943; 5: 115-133Crossref Scopus (5172) Google Scholar). Not long after, others proposed mechanisms by which networks of neurons might learn incrementally via supervisory feedback (Rosenblatt, 1958Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain.Psychol. Rev. 1958; 65: 386-408Crossref PubMed Scopus (2690) Google Scholar) or efficiently encode environmental statistics in an unsupervised fashion (Hebb, 1949Hebb D.O. The Organization of Behavior. John Wiley & Sons, 1949Google Scholar). These mechanisms opened up the field of artificial neural network research, and they continue to provide the foundation for contemporary research on deep learning (Schmidhuber, 2014Schmidhuber J. Deep learning in neural networks: an overview.arXiv. 2014; : 14047828Google Scholar). Not long after this pioneering work, the development of the backpropagation algorithm allowed learning to occur in networks composed of multiple layers (Rumelhart et al., 1985Rumelhart D.E. Hinton G. Williams R.J. Learning internal representations by error propagation.in: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1. MIT Press, 1985: 318-362Google Scholar, Werbos, 1974Werbos P.J. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Harvard University, 1974Google Scholar). Notably, the implications of this method for understanding intelligence, including AI, were first appreciated by a group of neuroscientists and cognitive scientists, working under the banner of parallel distributed processing (PDP) (Rumelhart et al., 1986Rumelhart D.E. McClelland J.L. Group P.R. Parallel Distributed Processing: Explorations in the Microstructures of Cognition. Volume 1. MIT Press, 1986Google Scholar). At the time, most AI research was focused on building logical processing systems based on serial computation, an approach inspired in part by the notion that human intelligence involves manipulation of symbolic representations (Haugeland, 1985Haugeland J. Artificial Intelligence: The Very Idea. MIT Press, 1985Google Scholar). However, there was a growing sense in some quarters that purely symbolic approaches might be too brittle and inflexible to solve complex real-world problems of the kind that humans routinely handle. Instead, a growing foundation of knowledge about the brain seemed to point in a very different direction, highlighting the role of stochastic and highly parallelized information processing. Building on this, the PDP movement proposed that human cognition and behavior emerge from dynamic, distributed interactions within networks of simple neuron-like processing units, interactions tuned by learning procedures that adjust system parameters in order to minimize error or maximize reward. Although the PDP approach was at first applied to relatively small-scale problems, it showed striking success in accounting for a wide range of human behaviors (Hinton et al., 1986Hinton G.E. McClelland J.L. Rumelhart D.E. Distributed Representations.in: Explorations in the Microstructure of Cognition. MIT Press, 1986: 77-109Google Scholar). Along the way, PDP research introduced a diverse collection of ideas that have had a sustained influence on AI research. For example, current machine translation research exploits the notion that words and sentences can be represented in a distributed fashion (i.e., as vectors) (LeCun et al., 2015LeCun Y. Bengio Y. Hinton G. Deep learning.Nature. 2015; 521: 436-444Crossref PubMed Scopus (5375) Google Scholar), a principle that was already ingrained in early PDP-inspired models of sentence processing (St. John and McClelland, 1990St. John M.F. McClelland J.L. Learning and applying contextual constraints in sentence comprehension.Artif. Intell. 1990; 46: 217-257Crossref Google Scholar). Building on the PDP movement’s appeal to biological computation, current state-of-the-art convolutional neural networks (CNNs) incorporate several canonical hallmarks of neural computation, including nonlinear transduction, divisive normalization, and maximum-based pooling of inputs (Yamins and DiCarlo, 2016Yamins D.L. DiCarlo J.J. Using goal-driven deep learning models to understand sensory cortex.Nat. Neurosci. 2016; 19: 356-365Crossref PubMed Google Scholar). These operations were directly inspired by single-cell recordings from the mammalian visual cortex that revealed how visual input is filtered and pooled in simple and complex cells in area V1 (Hubel and Wiesel, 1959Hubel D.H. Wiesel T.N. Receptive fields of single neurones in the cat’s striate cortex.J. Physiol. 1959; 148: 574-591Crossref PubMed Scopus (2700) Google Scholar). Moreover, current network architectures replicate the hierarchical organization of mammalian cortical systems, with both convergent and divergent information flow in successive, nested processing layers (Krizhevsky et al., 2012Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pp. 1097–1105.Google Scholar, LeCun et al., 1989LeCun Y. Boser B. Denker J.S. Henderson D. Howard R.E. Hubbard W. Jackel L.D. Backpropagation applied to handwritten zip code recognition.Neural Comput. 1989; 1: 541-551Crossref Google Scholar, Riesenhuber and Poggio, 1999Riesenhuber M. Poggio T. Hierarchical models of object recognition in cortex.Nat. Neurosci. 1999; 2: 1019-1025Crossref PubMed Scopus (1803) Google Scholar, Serre et al., 2007Serre T. Wolf L. Bileschi S. Riesenhuber M. Poggio T. Robust object recognition with cortex-like mechanisms.IEEE Trans. Pattern Anal. Mach. Intell. 2007; 29: 411-426Crossref PubMed Scopus (993) Google Scholar), following ideas first advanced in early neural network models of visual processing (Fukushima, 1980Fukushima K. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position.Biol. Cybern. 1980; 36: 193-202Crossref PubMed Scopus (1247) Google Scholar). In both biological and artificial systems, successive non-linear computations transform raw visual input into an increasingly complex set of features, permitting object recognition that is invariant to transformations of pose, illumination, or scale. As the field of deep learning evolved out of PDP research into a core area within AI, it was bolstered by new ideas, such as the development of deep belief networks (Hinton et al., 2006Hinton G.E. Osindero S. Teh Y.W. A fast learning algorithm for deep belief nets.Neural Comput. 2006; 18: 1527-1554Crossref PubMed Scopus (5251) Google Scholar) and the introduction of large datasets inspired by research on human language (Deng et al., 2009Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009). Imagenet: a large-scale hierarchical image database. In Computer Vision and Pattern Recognition, pp. 1–8.Google Scholar). During this period, it continued to draw key ideas from neuroscience. For example, biological considerations informed the development of successful regularization schemes that support generalization beyond training data. One such scheme, in which only a subset of units participate in the processing of a given training example (“dropout”), was motivated by the stochasticity that is inherent in biological systems populated by neurons that fire with Poisson-like statistics (Hinton et al., 2012Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R.R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv, arXiv:12070580.Google Scholar). Here and elsewhere, neuroscience has provided initial guidance toward architectural and algorithmic constraints that lead to successful neural network applications for AI. Alongside its important role in the development of deep learning, neuroscience was also instrumental in erecting a second pillar of contemporary AI, stimulating the emergence of the field of reinforcement learning (RL). RL methods address the problem of how to maximize future reward by mapping states in the environment to actions and are among the most widely used tools in AI research (Sutton and Barto, 1998Sutton R. Barto A. Reinforcement Learning. MIT Press, 1998Google Scholar). Although it is not widely appreciated among AI researchers, RL methods were originally inspired by research into animal learning. In particular, the development of temporal-difference (TD) methods, a critical component of many RL models, was inextricably intertwined with research into animal behavior in conditioning experiments. TD methods are real-time models that learn from differences between temporally successive predictions, rather than having to wait until the actual reward is delivered. Of particular relevance was an effect called second-order conditioning, where affective significance is conferred on a conditioned stimulus (CS) through association with another CS rather than directly via association with the unconditioned stimulus (Sutton and Barto, 1981Sutton R.S. Barto A.G. Toward a modern theory of adaptive networks: expectation and prediction.Psychol. Rev. 1981; 88: 135-170Crossref PubMed Scopus (777) Google Scholar). TD learning provides a natural explanation for second-order conditioning and indeed has gone on to explain a much wider range of findings from neuroscience, as we discuss below. Here, as in the case of deep learning, investigations initially inspired by observations from neuroscience led to further developments that have strongly shaped the direction of AI research. From their neuroscience-informed origins, TD methods and related techniques have gone on to supply the core technology for recent advances in AI, ranging from robotic control (Hafner and Riedmiller, 2011Hafner R. Riedmiller M. Reinforcement learning in feedback control.Mach. Learn. 2011; 84: 137-169Crossref Scopus (23) Google Scholar) to expert play in backgammon (Tesauro, 1995Tesauro G. Temporal difference learning and TD-Gammon.Commun. ACM. 1995; 38: 58-68Crossref Scopus (0) Google Scholar) and Go (Silver et al., 2016Silver D. Huang A. Maddison C.J. Guez A. Sifre L. van den Driessche G. Schrittwieser J. Antonoglou I. Panneershelvam V. Lanctot M. et al.Mastering the game of Go with deep neural networks and tree search.Nature. 2016; 529: 484-489Crossref PubMed Scopus (1344) Google Scholar). Reading the contemporary AI literature, one gains the impression that the earlier engagement with neuroscience has diminished. However, if one scratches the surface, one can uncover many cases in which recent developments have been inspired and guided by neuroscientific considerations. Here, we look at four specific examples. The brain does not learn by implementing a single, global optimization principle within a uniform and undifferentiated neural network (Marblestone et al., 2016Marblestone A.H. Wayne G. Kording K.P. Toward an integration of deep learning and neuroscience.Front. Comput. Neurosci. 2016; 10: 94Crossref PubMed Scopus (43) Google Scholar). Rather, biological brains are modular, with distinct but interacting subsystems underpinning key functions such as memory, language, and cognitive control (Anderson et al., 2004Anderson J.R. Bothell D. Byrne M.D. Douglass S. Lebiere C. Qin Y. An integrated theory of the mind.Psychol. Rev. 2004; 111: 1036-1060Crossref PubMed Scopus (0) Google Scholar, Shallice, 1988Shallice T. From Neuropsychology to Mental Structure. Cambridge University Press, 1988Crossref Google Scholar). This insight from neuroscience has been imported, often in an unspoken way, into many areas of current AI. One illustrative example is recent AI work on attention. Up until quite lately, most CNN models worked directly on entire images or video frames, with equal priority given to all image pixels at the earliest stage of processing. The primate visual system works differently. Rather than processing all input in parallel, visual attention shifts strategically among locations and objects, centering processing resources and representational coordinates on a series of regions in turn (Koch and Ullman, 1985Koch C. Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry.Hum. Neurobiol. 1985; 4: 219-227PubMed Google Scholar, Moore and Zirnsak, 2017Moore T. Zirnsak M. Neural mechanisms of selective visual attention.Annu. Rev. Psychol. 2017; 68: 47-72Crossref PubMed Scopus (25) Google Scholar, Posner and Petersen, 1990Posner M.I. Petersen S.E. The attention system of the human brain.Annu. Rev. Neurosci. 1990; 13: 25-42Crossref PubMed Google Scholar). Detailed neurocomputational models have shown how this piecemeal approach benefits behavior, by prioritizing and isolating the information that is relevant at any given moment (Olshausen et al., 1993Olshausen B.A. Anderson C.H. Van Essen D.C. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information.J. Neurosci. 1993; 13: 4700-4719Crossref PubMed Google Scholar, Salinas and Abbott, 1997Salinas E. Abbott L.F. Invariant visual responses from attentional gain fields.J. Neurophysiol. 1997; 77: 3267-3272Crossref PubMed Scopus (99) Google Scholar). As such, attentional mechanisms have been a source of inspiration for AI architectures that take “glimpses” of the input image at each step, update internal state representations, and then select the next location to sample (Larochelle and Hinton, 2010Larochelle, H., and Hinton, G. (2010). Learning to combine foveal glimpses with a third-order Boltzmann machine. NIPS’10 Proceedings of the International Conference on Neural Information Processing Systems, pp. 1243–1251.Google Scholar, Mnih et al., 2014Mnih, V., Heess, N., Graves, A., and Kavukcuoglu, K. (2014). Recurrent models of visual attention. arXiv, arXiv:14066247.Google Scholar) (Figure 1A). One such network was able to use this selective attentional mechanism to ignore irrelevant objects in a scene, allowing it to perform well in challenging object classification tasks in the presence of clutter (Mnih et al., 2014Mnih, V., Heess, N., Graves, A., and Kavukcuoglu, K. (2014). Recurrent models of visual attention. arXiv, arXiv:14066247.Google Scholar). Further, the attentional mechanism allowed the computational cost (e.g., number of network parameters) to scale favorably with the size of the input image. Extensions of this approach were subsequently shown to produce impressive performance at difficult multi-object recognition tasks, outperforming conventional CNNs that process the entirety of the image, both in terms of accuracy and computational efficiency (Ba et al., 2015Ba, J.L., Mnih, V., and Kavukcuoglu, K. (2015). Multiple object recognition with visual attention. arXiv, arXiv:14127755.Google Scholar), as well as enhancing image-to-caption generation (Xu et al., 2015Xu, K., Kiros, J., Courville, A., Salakhutdinov, R., and Bengio, Y. (2015). Show, attend and tell: neural image caption generation with visual attention. arXiv, arXiv:150203044.Google Scholar). While attention is typically thought of as an orienting mechanism for perception, its “spotlight” can also be focused internally, toward the contents of memory. This idea, a recent focus in neuroscience studies (Summerfield et al., 2006Summerfield J.J. Lepsien J. Gitelman D.R. Mesulam M.M. Nobre A.C. Orienting attention based on long-term memory experience.Neuron. 2006; 49: 905-916Abstract Full Text Full Text PDF PubMed Scopus (118) Google Scholar), has also inspired work in AI. In some architectures, attentional mechanisms have been used to select information to be read out from the internal memory of the network. This has helped provide recent successes in machine translation (Bahdanau et al., 2014Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv, arXiv:14090473.Google Scholar) and led to important advances on memory and reasoning tasks (Graves et al., 2016Graves A. Wayne G. Reynolds M. Harley T. Danihelka I. Grabska-Barwińska A. Colmenarejo S.G. Grefenstette E. Ramalho T. Agapiou J. et al.Hybrid computing using a neural network with dynamic external memory.Nature. 2016; 538: 471-476Crossref PubMed Scopus (107) Google Scholar). These architectures offer a novel implementation of content-addressable retrieval, which was itself a concept originally introduced to AI from neuroscience (Hopfield, 1982Hopfield J.J. Neural networks and physical systems with emergent collective computational abilities.Proc. Natl. Acad. Sci. USA. 1982; 79: 2554-2558Crossref PubMed Google Scholar). One further area of AI where attention mechanisms have recently proven useful focuses on generative models, systems that learn to synthesize or “imagine” images (or other kinds of data) that mimic the structure of examples presented during training. Deep generative models (i.e., generative models implemented as multi-layered neural networks) have recently shown striking successes in producing synthetic outputs that capture the form and structure of real visual scenes via the incorporation of attention-like mechanisms (Hong et al., 2015Hong, S., Oh, J., Bohyung, H., and Lee, H. (2015). Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. arXiv, arXiv:151207928.Google Scholar, Reed et al., 2016Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., and Lee, H. (2016). Learning what and where to draw. arXiv, arXiv:161002454.Google Scholar). For example, in one state-of-the-art generative model known as DRAW, attention allows the system to build up an image incrementally, attending to one portion of a “mental canvas” at a time (Gregor et al., 2015Gregor, K., Danihelka, I., Graves, A., Renzende, D., and Wierstra, D. (2015). DRAW: a recurrent neural network for image generation. arXiv, arXiv:150204623.Google Scholar). A canonical theme in neuroscience is that that intelligent behavior relies on multiple memory systems (Tulving, 1985Tulving E. How many memory systems are there?.American Psychologist. 1985; 40: 385-398Crossref Google Scholar). These will include not only reinforcement-based mechanisms, which allow the value of stimuli and actions to be learned incrementally and through repeated experience, but also instance-based mechanisms, which allow experiences to be encoded rapidly (in “one shot”) in a content-addressable store (Gallistel and King, 2009Gallistel C. King A.P. Memory and the Computational Brain: Why Cognitive Science will Transform Neuroscience. Wiley-Blackwell, 2009Crossref Scopus (138) Google Scholar). The latter form of memory, known as episodic memory (Tulving, 2002Tulving E. Episodic memory: from mind to brain.Annu. Rev. Psychol. 2002; 53: 1-25Crossref PubMed Scopus (1876) Google Scholar), is most often associated with circuits in the medial temporal lobe, prominently including the hippocampus (Squire et al., 2004Squire L.R. Stark C.E. Clark R.E. The medial temporal lobe.Annu. Rev. Neurosci. 2004; 27: 279-306Crossref