Title: On optimal inference in the linear IV model
Abstract: Quantitative EconomicsVolume 10, Issue 2 p. 457-485 Original ArticlesOpen Access On optimal inference in the linear IV model Donald W. K. Andrews, Donald W. K. Andrews [email protected] Cowles Foundation, Yale UniversitySearch for more papers by this authorVadim Marmer, Vadim Marmer [email protected] Vancouver School of Economics, University of British ColumbiaSearch for more papers by this authorZhengfei Yu, Zhengfei Yu [email protected] Faculty of Humanities and Social Sciences, University of TsukubaAndrews gratefully acknowledges the research support of the National Science Foundation via Grants SES-1355504 and SES-1656313. Marmer gratefully acknowledges the research support of the SSHRC via Grants 410-2010-1394 and 435-2013-0331. The authors thank Yanqin Fan, Patrik Guggenberger, Marcelo Moreira, James MacKinnon, Chen Zhang, and three referees for helpful comments.Search for more papers by this author Donald W. K. Andrews, Donald W. K. Andrews [email protected] Cowles Foundation, Yale UniversitySearch for more papers by this authorVadim Marmer, Vadim Marmer [email protected] Vancouver School of Economics, University of British ColumbiaSearch for more papers by this authorZhengfei Yu, Zhengfei Yu [email protected] Faculty of Humanities and Social Sciences, University of TsukubaAndrews gratefully acknowledges the research support of the National Science Foundation via Grants SES-1355504 and SES-1656313. Marmer gratefully acknowledges the research support of the SSHRC via Grants 410-2010-1394 and 435-2013-0331. The authors thank Yanqin Fan, Patrik Guggenberger, Marcelo Moreira, James MacKinnon, Chen Zhang, and three referees for helpful comments.Search for more papers by this author First published: 08 May 2019 https://doi.org/10.3982/QE1082Citations: 13 AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract This paper considers tests and confidence sets (CSs) concerning the coefficient on the endogenous variable in the linear IV regression model with homoskedastic normal errors and one right-hand side endogenous variable. The paper derives a finite-sample lower bound function for the probability that a CS constructed using a two-sided invariant similar test has infinite length and shows numerically that the conditional likelihood ratio (CLR) CS of Moreira (2003) is not always "very close," say 0.005 or less, to this lower bound function. This implies that the CLR test is not always very close to the two-sided asymptotically-efficient (AE) power envelope for invariant similar tests of Andrews, Moreira, and Stock (2006) (AMS). On the other hand, the paper establishes the finite-sample optimality of the CLR test when the correlation between the structural and reduced-form errors, or between the two reduced-form errors, goes to 1 or −1 and other parameters are held constant, where optimality means achievement of the two-sided AE power envelope of AMS. These results cover the full range of (nonzero) IV strength. The paper investigates in detail scenarios in which the CLR test is not on the two-sided AE power envelope of AMS. Also, theory and numerical results indicate that the CLR test is close to having the greatest average power, where the average is over a specified grid of concentration parameter values and over a pair of alternative hypothesis values of the parameter of interest, uniformly over all such pairs of alternative hypothesis values and uniformly over the correlation between the structural and reduced-form errors. Here, "close" means 0.015 or less for k ≤ 20, where k denotes the number of IVs, and 0.025 or less for 0 < k ≤ 40. The paper concludes that, although the CLR test is not always very close to the two-sided AE power envelope of AMS, CLR tests and CSs have very good overall properties. 1 Introduction The linear instrumental variables (IV) regression model is one of the most widely used models in economics. It has been widely studied and considerable effort has been made to develop good estimation and inference methods for it. In particular, following the recognition that standard two stage least squares t tests and confidence sets (CSs) can perform quite poorly under weak IVs (see Dufour (1997), Staiger and Stock (1997), and references therein), inference procedures that are robust to weak IVs have been developed, for example, see Kleibergen (2002) and Moreira (2003, 2009). The focus has been on models with one right-hand side endogenous variable, because this arises most frequently in applications, and on over-identified models, because Anderson and Rubin (1949) (AR) tests and CSs are robust to weak IVs and perform very well in exactly-identified models. Andrews, Moreira, and Stock (2006) (AMS) develop a finite-sample two-sided AE power envelope for invariant similar tests concerning the coefficient on the right-hand side endogenous variable in the linear IV model under homoskedastic normal errors and known reduced-form variance matrix. They show via numerical simulations that the conditional likelihood ratio (CLR) test of Moreira (2003) has power that is essentially (i.e., up to simulation error) on the power envelope. Chernozhukov, Hansen, and Jansson (2009) show that this power envelope also applies to noninvariant tests provided the envelope is for power averaged over certain direction vectors in a unit sphere. Chernozhukov, Hansen, and Jansson (2009) also showed that the invariant similar tests that generate the two-sided AE power envelope are α-admissible and d-admissible. Mikusheva (2010) provided approximate optimality results for CLR-based CSs that utilize the testing results in AMS. Chamberlain (2007), Andrews, Moreira, and Stock (2008), and Hillier (2009) provided related results. It is shown in Dufour (1997) that any CS with correct size must have positive probability of having infinite length at every point in the parameter space. The AR and CLR CSs have this property. In fact, simulation results show that in some over-identified contexts the AR CS has a lower probability of having an infinite length than the CLR CS does. For example, consider a model with one right-hand side endogenous variable, k IVs, a concentration parameter (which is a measure of the strength of the IVs), homoskedastic normal errors, a correlation between the structural-equation error and the reduced-form error (for the first-stage equation) equal to zero, and no covariates. When equals , , , , and , the differences between the probabilities that the CLR and AR CSs have infinite length are 0.013, 0.027, 0.037, 0.043, and 0.049, respectively.1 In fact, one obtains positive differences for all combinations of for and . Hence, in these over-identified scenarios the AR CS outperforms the CLR CS in terms of its infinite-length behavior, which is an important property for CSs. Similarly, one obtains positive (but smaller) differences also when for the same range of values. On the other hand, for , and 0.9, the differences are negative over the same range of values. The AR and CLR CSs are based on inverting AR and CLR tests that fall into the class of invariant similar tests considered in AMS. Hence, the simulation results for and 0.3 raise the question: How can these results be reconciled with the near optimal CLR test and CS results described above? In this paper, we answer this question and related questions concerning the optimality of the CLR test and CS. The contributions of the paper are as follows. First, the paper shows that the probability that an invariant similar CS has infinite length for a fixed true parameter value equals one minus the power against of the test used to construct the CS as the null value goes to ∞ or −∞. This leads to explicit formulae for the probabilities that the AR and CLR CSs have infinite length. This result is established in the paper for homoskedastic errors. It is extended in Section 24 in the Online Supplementary Material 1 (Andrews, Marmer, and Yu (2019)) to the case of heteroskedastic and autocorrelated errors. Second, the paper determines a finite-sample lower bound function on the probabilities that a CS has infinite length for CSs based on invariant similar tests. This lower bound is obtained by using the first result and finding the limit of the power bound in AMS as the null value goes to ∞ or −∞. The lower bound function is found to be very simple. It is a function only of , , and k. These results allow one to compare the probabilities that the AR and CLR CSs have infinite length with the lower bound. Third, simulation results show that the AR and CLR CSs are not always close to the lower bound. This is not surprising for the AR CS, but it is surprising for the CLR CS in light of the AMS results. The probabilities that the CLR CS has infinite length are found to be off the lower bound function by a magnitude that is decreasing in , increasing in k, and are maximized over at values that correspond to somewhat weak IVs, but not irrelevant IVs. For , the paper shows (analytically) that the AR test achieves the lower bound function. Hence, for , the probabilities that the CLR CS has infinite length exceed the lower bound by the same amounts as reported above for the difference between the infinite length probabilities of the CLR and AR CSs for several values. On the other hand, for values of , the CLR CS has probabilities of having infinite length that are close to the lower bound function, 0.010 or less and typically much less, for all combinations considered. For values of , the AR CS has probabilities of having infinite length that are often far from the lower bound. For and certain values of , they are as large as 0.089, 0.207, 0.288, 0.357, and 0.426 for , 5, 10, 20, and 40, respectively.2 The AMS numerical results did not detect scenarios where the CLR test's power is off the two-sided power envelope because AMS focused on power for a fixed null hypothesis and a wide range of alternative values, whereas the probability that a CS has infinite length depends on the underlying tests' power for a fixed true parameter and arbitrarily distant null hypothesis values. As discussed in Section 4 below, power in these two scenarios is different. Fourth, the paper derives new optimality properties of the CLR and Lagrange multiplier (LM) tests when or with other parameters fixed at any values (with nonzero concentration parameter), where denotes or and likewise for . In particular, optimality holds for fixed finite nonzero values of the concentration parameter. Optimality here is in the class of invariant similar tests or similar tests and employs the two-sided AE power envelope of AMS. These results are empirically relevant because they are consistent with the numerical results that show that the CLR test is close to the power envelope when is large, namely, 0.7 and 0.9, but not extremely close to one. These optimality results hold because taking or with other parameters fixed drives the length of the mean vector of the conditioning statistic T, as defined in AMS and below, to infinity. This is the same mechanism that yields asymptotic optimality of the CLR and LM tests when the concentration parameter goes to infinity as (i.e., under strong or semistrong IVs). The results show that arbitrarily large values of the concentration parameter are not needed for limiting optimality of the CLR and LM tests. Fifth, we simulate power differences between the two-sided AE power envelope of AMS and the power of the CLR test for a fixed alternative value and a range of finite null values (rather than the power differences as discussed above). These power differences are equivalent to the false coverage probability differences between the CLR CS and the corresponding infeasible optimal CS for a fixed true value at incorrect values . We consider a wide range of values. The maximum (over and values) power differences range between over the values considered. On the other hand, the average (over and λ values) power differences only range between . This indicates that, although there are some values at which the CLR test is noticeably off the power envelope, on average the CLR test's power is not far from the power envelope. The maximum power differences over are found to increase in k and decrease in . The values at which the maxima are obtained are found to (weakly) increase with k and decrease in . The values at which the maxima are obtained are found to be independent of k and decrease in . Sixth, the paper considers a weighted average power (WAP) envelope with a uniform weight function over a grid of concentration parameter values and the same two-point AE weight function over as in AMS. We refer to this as the WAP2 envelope. We determine numerically how close the power of the CLR test is to the WAP2 envelope. We find that the difference between the WAP2 envelope and the average power of the CLR test is in the range of over all of the values that we consider. Hence, the average power of the CLR test is quite close to the WAP2 envelope. Other papers in the literature that consider WAP include Wald (1943), Andrews and Ploberger (1994), Andrews (1998), Moreira and Moreira (2013, 2015), Elliott, Müller, and Watson (2015), and papers referenced above. The WAP2 envelope considered here is closest to the WAP envelopes in Wald (1943), AMS, and Chernozhukov, Hansen, and Jansson (2009) because the other papers listed put a weight function over all of the parameters in the alternative hypothesis, which yields a single weighted alternative density. In contrast, the WAP2 envelope, Wald (1943), AMS, and Chernozhukov, Hansen, and Jansson (2009) consider a family of weight functions over disjoint sets of parameters in the alternative hypothesis, which yields a WAP envelope. In conclusion, based on our findings, we recommend use of the CLR test and CS in settings with homoskedastic uncorrelated errors. The CLR CS has higher probability of having infinite length than the AR CS in some scenarios, and the CLR test is not a UMP two-sided invariant similar test. But, no such UMP test exists and the CLR CS is close to the two-sided AE power envelope for invariant similar tests when is not close to zero and is close to the WAP2 envelope for all values of . In settings where the errors may be heteroskedastic or autocorrelated, tests exist that reduce to the CLR test under homoskedastic and uncorrelated errors, for example, see Andrews, Moreira, and Stock (2004), Andrews and Guggenberger (2018), and I. Andrews and Mikusheva (2016). Other tests designed for the heteroskedastic and/or autocorrelated errors are given in Moreira and Moreira (2015) and I. Andrews (2016). Finally, we point out that the results of this paper illustrate a point that applies more generally than in the linear IV model. In weak identification scenarios, where CSs may have infinite length (or may be bounded only due to bounds on the parameter space), good test performance at a priori implausible parameter values is important for good CS performance at plausible parameter values. More specifically, the probability under an a priori plausible parameter value that a CS has infinite length depends on the power of the test used to construct the CS against when the null value is arbitrarily large, which may be an a priori implausible null value. For the computation of CLR CSs, see Mikusheva (2010). For a formula for the power of the CLR test, see Hillier (2009). The paper is organized as follows. Section 2 specifies the model. Section 3 defines the class of invariant similar tests. Section 4 contrasts the power properties of tests in the scenario where is fixed and takes on large (absolute) values, with the scenario where is fixed and takes on large (absolute) values. Section 5 provides a formula for the probability that a CS has infinite length. Section 6 derives a lower bound on the probability that a CS constructed using two-sided invariant similar tests has infinite length. Section 7 reports differences between the probability that the CLR CS has infinite length and the lower bound derived in the previous section. Section 8 proves the optimality results for the CLR test described above. Section 9 reports differences between the power of CLR tests and the two-sided AE power bound of AMS for a wide range of parameter configurations. Section 10 provides comparisons of the power of the CLR test to the WAP2 power envelope described above. Proofs and additional theoretical results are given in the Online Supplementary Material 1. Additional numerical results are given in the Online Supplementary Material 2. 2 Model We consider the same model as in Andrews, Moreira, and Stock (2004, 2006) (AMS04, AMS) but, for simplicity and without loss of generality, without any exogenous variables. The model has one right-hand side endogenous variable, k instrumental variables (IVs), and normal errors with known reduced-form error variance matrix. The model consists of a structural equation and a reduced-form equation: (1) where and are observed variables; are unobserved errors; and and are unknown parameters. The IV matrix Z is fixed (i.e., nonstochastic) and has full column rank k. The matrix of errors is i.i.d. across rows with each row having a mean zero bivariate normal distribution with positive variances. The two corresponding reduced-form equations are (2) The distribution of is multivariate normal with mean matrix , independence across rows, and reduced-form variance matrix for each row. For the purposes of obtaining exact finite-sample results, we suppose Ω is known. As in AMS, asymptotic results for unknown Ω and weak IVs are the same as the exact results with known Ω. The parameter space for is . We are interested in tests of the null hypothesis and CSs for β. As shown in AMS, is a sufficient statistic for . As in Moreira (2003) and AMS, we consider a one-to-one transformation of : (3) As defined, S and T are independent. Note that S and T depend on the null hypothesis value . 3 Invariant similar tests As in AMS, we consider tests that are invariant to orthonormal transformations of , that is, for a orthogonal matrix F. The matrix Q is a maximal invariant, where (4) for example, see Theorem 1 of AMS. Note that is the first column of Q and the matrix Q depends on the null value . The statistic Q has a noncentral Wishart distribution because is a multivariate normal matrix that has independent rows and common covariance matrix across rows. The distribution of Q depends on π only through the scalar (5) Leading examples of invariant identification-robust tests in the literature include the AR test, the LM test of Kleibergen (2002) and Moreira (2009), and the CLR test of Moreira (2003). The latter test depends on the standard LR test statistic coupled with a "conditional" critical value that depends on . The LR, LM, and AR test statistics are (6) The critical values for the and tests are and , respectively, where denotes the quantile of the distribution with m degrees of freedom. A test based on the maximal invariant Q is similar if its null rejection rate does not depend on the parameter π that determines the strength of the IVs Z. As in Moreira (2003), the class of invariant similar tests is specified as follows. Let the -valued statistic denote a (possibly randomized) test that depends on the maximal invariant Q. An invariant test is similar with significance level α if and only if for almost all (with respect to Lebesgue measure), where denotes conditional expectation given when (which does not depend on π). The CLR test rejects the null hypothesis when (7) where is defined to satisfy and the conditional distribution of given is specified in AMS and in (26) in the Online Supplementary Material 1. The invariance condition discussed above is a rotational invariance condition. In some cases, we also consider a sign invariance condition. A test that depends on is sign invariant if it is invariant to the transformation . A rotation invariant test is also sign invariant if it depends on only through . Tests that are sign invariant are two-sided tests. In fact, AMS shows that the two-sided AE power envelope is identical to the power envelope generated by sign and rotation invariant tests; see (4.11) in AMS. For simplicity, we will use the term invariant test to mean a rotation invariant test and the term sign and rotation invariant test to describe a test that satisfies both invariance conditions. The paper also provides some results that apply to tests that satisfy no invariance properties. A test (that is not necessarily invariant) is similar with significance level α if and only if for almost all t (with respect to Lebesgue measure), where denotes conditional expectation given when (which does not depend on π); see Moreira (2009). 4 Power against distant alternatives compared to distant null hypotheses In this section, we consider the power properties of tests when is large, where denotes the true value of β. We compare scenario 1, where and Ω are fixed, and takes on large (absolute) values, to scenario 2, where and Ω are fixed, and takes on large (absolute) values. Scenario 1 yields the power function of a test against distant alternatives. Scenario 2 yields the false coverage probabilities of the CS constructed using the test for distant null hypotheses (from the true parameter value ). We show that, while power goes to one in scenario 1 as for fixed for standard tests, it is not true that power goes to one in scenario 2 as for fixed . Hence, the power properties of tests are quite different in scenarios 1 and 2. The numerical power function and power envelope calculations in AMS are all of the types in scenario 1. The difference in power properties of tests between scenarios 1 and 2 suggests that it is worth exploring the properties of tests in scenarios of the latter type as well. We do this in the paper and show that the finding of AMS that the CLR test is essentially on the two-sided AE power envelope and is always at least as powerful as the AR test does not hold when one considers a broader range of null and alternative hypothesis values than considered in the numerical results in AMS. It is convenient to consider the AR test, which is the simplest test. The AR test rejects when . When the true value is β, the distribution of the statistic is noncentral with noncentrality parameter (8) and k degrees of freedom. For the fixed null hypothesis , fixed Ω, and fixed λ, the power at the alternative hypothesis value is determined by . We have (9) Hence, the power of the AR test goes to one as . On the other hand, if one fixes the alternative hypothesis value and one considers the limit as , then one obtains (10) where , , and denote the , , and elements of Ω, respectively. Hence, the power of the AR test does not go to one as even though . This occurs because the structural equation error variance, , diverges to infinity as . The differing results in (9) and (10) is easy to show for the AR test, but it also holds for Kleibergen's and Moreira's LM test and Moreira's CLR test. For brevity, we do not provide such results here. Note that Davidson and MacKinnon (2008, Section 4) provided different, but somewhat related, results to those in this section.3 They consider power when is fixed and takes on large (absolute) values (as in scenario 1) but when the correlation (between the structural-equation error u and the reduced-form error ) is held fixed and the structural equation error variance is estimated. In contrast, the results given here are for the case where the correlation (between the reduced-form errors and ) is held fixed because can be consistently estimated, and hence, in large samples can be treated as fixed and known. This is not true for . In the Davidson and MacKinnon (2008) scenario, power does not go to one as for fixed . 5 Probability that a confidence set has infinite length In this section, we show that the probability that a CS has infinite length is given by one minus the power of the test used to construct the CS as the null value of the test goes to ∞ or −∞. This provides motivation for interest in the power of tests as . It shows why high power against distant null hypotheses is highly desirable. We sometimes make the dependence of Q, S, and T on Y and explicit and write (11) We denote the , , and elements of by , , and , respectively. Let (12) be a (nonrandomized) invariant similar level α test for testing for fixed known Ω, where is a test statistic and is a (possibly data-dependent) critical value. Examples include the AR, LM, and CLR tests in (6). Let be the level CS corresponding to ϕ. That is, (13) We say has right (or left) infinite length, which we denote by (or ), if (14) We say has infinite length, which we denote by , if it has right and left infinite lengths. A CS with infinite length contains a set of the form for some . Let denote probability for events determined by Y when Y has a multivariate normal distribution with means matrix , independence across rows, and variance matrix Ω for each row. Let denote probability for events determined by Q when and has the multivariate normal distribution in (3) with and . In this case, Q has a noncentral Wishart distribution whose density is given in (25) in the Online Supplementary Material 1. For fixed true value and reduced-form variance matrix Ω, let denote the corresponding structural variance matrix of each row of . Let denote the correlation between the structural and reduced-form errors, that is, the correlation corresponding to . Some calculations show that (15) where , , and are the elements of Ω; see (32) in the Online Supplementary Material 1. By the first equality in the second line of (15), , , and . It is shown in Lemma 16.1 in the Online Supplementary Material 1 that the limit as of with Y fixed is (16) which is the same whether or −∞, where , , and . Let denote the element of . It is also shown in Lemma 16.1 in the Online Supplementary Material 1 that has a noncentral Wishart distribution with means matrix and identity variance matrix.4 Theorem 5.1.Suppose is a CS based on invariant level α tests whose test statistic and critical value functions, and , respectively, are continuous at all positive definite matrices q and positive constants , for in parts (a) and (c) below and in parts (b) and (c) below. Then, for all , , and Ω positive definite; (a) , (b) , and (c) . Comments. (i). For the AR, LM, and LR tests, the continuity conditions on and hold given their simple functional forms in (6) using the assumption that for the LM statistic and using the continuity of , which holds by the argument in the proof of Theorem 10.1 in Andrews and Guggenberger (2017). We have for the AR and LM tests because is a constant and is absolutely continuous with respect to Lebesgue measure. For the CLR test, by the argument given in the proof of Theorem 6.4 in the Online Supplementary Material 1. The AR, LM, and CLR test statistics are sign invariant. Hence, parts (a)–(c) of Theorem 5.1 apply to these tests. Theorem 6.4(a)–(c) below provides formulae for the quantities , which appear in Theorem 5.1, for the AR, LM, and CLR tests. (ii). Comment (iii) to Theorem 6.2 below provides a lower bound on over all sign and rotation invariant similar level α tests. Combining this with Theorem 5.1(c) yields a lower bound on the probability that a CS based on such tests has . Theorem 13.1 in the Online Supplementary Material 1 provides lower bounds on over all invariant similar level α tests. Combining these with Theorem 5.1(a) and (b) yields lower bounds on the probabilities that a CS has based on and based on . (iii). Note that Theorem 5.1 does not impose similarity, just invariance. The results of Theorem 5.1(a) and (b) also hold for a that is based on level α tests that are not invariant. Denote such tests by and suppose their test statistic and critical value functions, and , respectively, are continuous at all matrices and k vectors t and satisfy for , where and . In this case, for all , , and Ω positive definite, and likewise with , , and in place of , , and . (iv). By Dufour (1997), all CSs for β with correct size must have positive probability of having infinite length (assuming π is not bounded away from 0). In consequence, expected CS length, which is a standard measure of the performance of a CS, is infinite for all identification-robust CSs. Due to this, Mikusheva (2010) compared CSs based on their expected truncated lengths for various truncation values. The result of Theorem 6.2 below implies that, for two CSs where the right-hand side of Theorem 6.2(c) is smaller for the first CS than the second, the first CS has smaller expected truncated length than the second for sufficiently large truncation values. (v). Section 24 in the Online Supplementary Material 1 extends Theorem 5.1 to the linear IV model that allows for heteroskedasticity and/or autocorrelation in the errors. (vi). As indicated in part (c), the