Abstract: In Chap. 2 the bias-variance tradeoff was introduced and approaches to regulate model complexity by some parameter λ—but how to choose it? Here is a fundamental issue in statistical model fitting or parameter estimation: We usually only have available a comparatively small sample from a much larger population, but we really want to make statements about the population as a whole. Now, if we choose a sufficiently flexible model, e.g., a local or spline regression model with many parameters, we may always achieve a perfect fit to the training data, as we already saw in Chap. 2 (see Fig. 2.5 ). The problem with this is that it might not say much about the true underlying population anymore as we may have mainly fitted noise—we have overfit the data, and consequently our model would generalize poorly to sets of new observations not used for fitting. As a note on the side, it is not only the nominal number of parameters relevant for this but also the functional form or flexibility of our model and constraints put on the parameters. For instance, of course we cannot accurately capture a nonlinear functional relationship with a (globally) linear model, regardless of how many parameters. Or, as noted before, in basis expansions and kernel approaches, the effective number of parameters may be much smaller as the variables are constrained by their functional relationships. This chapter, especially the following discussion and Sects. 4.1–4.4, largely develops along the exposition in Hastie et al. (2009; but see also the brief discussion in Bishop, 2006, from a slightly different angle).
Publication Year: 2017
Publication Date: 2017-01-01
Language: en
Type: book-chapter
Indexed In: ['crossref']
Access and Citation
Cited By Count: 1
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot