Title: Semiparametric analysis of regression models for longitudinal data
Abstract: Longitudinal data are often observed in many areas of research in practice such as public health and economic studies, where each subject involved is measured repeatedly over time.The defining feature of longitudinal studies is that the individual subject remains independent of another but the repeated measurements within subject are correlated.Since that the underlying correlation is hardly captured leads to the infeasibility of commonly used Generalized Linear Models (GLM), two types of models for different targets of inference, the marginal models and mixed effects models have been well developed in the past 30 years.In this thesis, we investigate new methods, extending these two models to deal with more practical and complicated cases.We propose a penalized quadratic inference functions (QIF) approach in Chapter 2 and apply it to analyze the longitudinal data with large number of covariates.Particularly for variable selection, we develop a new QIF incorporating the regularization technique and a quadratic penalty based on SCAD.Theoretical properties are well derived under the scenario of diverging number of covariates.Extensive simulation studies have been conducted to assess the performance of our proposed List of Tables 2.1 Comparison of quadratic SCAD and SCAD with simulated AR(1) normal data when the model correlation is correctly specified, based on 1000 replications. . . . . . . . . . . . . . . . . . . . . . .34 2.2 Comparison of quadratic SCAD and SCAD with simulated Exchangeable normal data when the model correlation is misspecified as AR(1), based on 1000 replications. . . . . . . . . . . . . . . . . . . . . . .34 2.3 The performance of the proposed RQIF with/without penalty, and the penalized GEE.For normal distribution with AR(1) correlation structure, ρ = 0.7.QIF.r&p and QIF.r represent regularized QIF with and without proposed quadratic SCAD penalty, respectively, while GEE.p stands for penalized GEE. . . . . . . . . . . . . . . .36 2.4 The result analysis for the normal responses after 1000 simulations, using regularized QIF with quadratic SCAD penalty. . . . . . . .37 2.5 Variable selection for exchangeable Poisson data using the regularized QIF with penalty. . . . . . . . . . . . . . . . . . . . . . . . .39 3.1 The mean, sample standard error (s.s.e) and estimated standard error (e.s.e) of the model parameter estimator β based on 500 simulations using proposed MIPQL method. . . . . . . . . . . . . . .72 3.2 Description of baseline characteristics for our PANS data study. .74 vii LIST OF TABLES 3.3 Inference for the PANS data and the performance of our proposed MIPQL compared with other existing approaches. . . . . . . . .76 viii 1.2 Review of related methods subjects.Note that in the linear mixed models, the regression parameters for the conditional means also have interpretation in terms of the population means, as same as the marginal models.But in general, two models are not interconvertible. Linear Mixed ModelsIn LMM, without loss of generality, for each repeated response Y ij , a normal distribution given random effect vector b i is assumed with constant conditional