Title: Statistically optimal perception and learning: from behavior to neural representations
Abstract: Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and re-evaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty. Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and re-evaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty. the average expected reward associated with a particular decision, α, when the state of the environment, y, is unknown. It can be computed by calculating the average of the utility function, U(α, y), describing the amount of reward obtained when making decision α if the true state of the environment is y, with regard to the posterior distribution, p(y|x), describing the degree of belief about the state of the environment given some sensory input, x: R(α) = ʃ U(α, y) p(y|x) dy. the function specifying the probability p(x|y,M) of observing a particular stimulus x for each possible state of the environment, y, under a probabilistic model of the environment, M. the process by which the distribution of a subset of variables, y1, is computed from the joint distribution of a larger set of variables, {y1, y2}: p(y1) = ʃ p(y1, y2) dy2. (This could be important if, for example, different decisions rely on different subsets of the same set of variables.) Importantly, in a sampling-based representation, in which different neurons represent these different subsets of variables, simply “reading” (e.g. by a downstream brain area) the activities of only those neurons that represent y1 automatically implements such a marginalization operation. in the context of probabilistic inference, it is an approximation by which instead of representing the full posterior distribution, only a single value of y is considered that has the highest probability under the posterior. (Formally, the full posterior is approximated by a Dirac-delta distribution, an infinitely narrow Gaussian, located at its maximum.) As a consequence, uncertainty about y is no longer represented. as the MAP estimate, it is also an approximation, but the full posterior is approximated by the single value of y which has the highest likelihood. the probability distribution p(y|x,M) produced by probabilistic inference according to a particular probabilistic model of the environment, M, giving the probability that the environment is in any of its possible states, y, when stimulus x is being observed. the probability distribution p(y|M) defining the expectation about the environment being in any of its possible states, y, before any observation is available according to a probabilistic model of the environment, M. the process by which the posterior is computed. It requires a probabilistic model, M, of stimuli x and states of the environment y, containing a prior and a likelihood. It is necessary when environmental states are not directly available to the observer: they can only be inferred from stimuli through inverting the relationship between y and x through Bayes’ rule: p(y|x,M) = p(x|y,M) p(y|M)/Z, where Z is a factor independent of y, ensuring that the posterior is a well-defined probability distribution. Note, that the posterior is a full probability distribution, rather than a single estimate over environmental states, y. In contrast with approximate inference methods, such as maximum likelihood or maximum a posteriori that compute single best estimates of y, the posterior fully represents the uncertainty about the inferred variables. the process of finding a suitable model for probabilistic inference. This itself can be viewed as a problem of probabilistic inference at a higher level, where the unobserved quantity is the model, M, including its parameters and structure. Thus, the complete description of the results of probabilistic learning is a posterior distribution, p(M|X), over possible models given all stimuli observed so far, X. Even though approximate versions, such as maximum likelihood or MAP, compute only a single best estimates of M, they still need to rely on representing uncertainty about the states of the environment, y. The effect of learning is usually a gradual change in the posterior (or estimate) as more and more stimuli are observed, reflecting the incremental nature of learning.