Title: Handling missing data in analyses of the UK women's cohort study
Abstract: Missing values are a problem in large-scale surveys with extensive questionnaires. The analysis of the complete records may yield inferences substantially different
from those that would be obtained had no data been missing.
The aim of this dissertation is to critically examine ways of handling missing data in the UK Women Cohort Study (UKWCS). This is a large dataset with continuous, categorical and binary variables with missing values in almost every variable.
A number of simple imputation techniques, as well as multiple imputation developed by Rubin (1987), and multiple imputation by chained equations using the Gibbs sampling (Van Buuren, 1999), were explored in a number of illustrative analyses associated with the UKWCS.
Three approaches of handling missing dietary information on alcohol consumption were compared. The comparison shows that ignoring missingness by analysing only complete cases produces bias (lower means). Imputing an extreme value zero as is customary at present, underestimates the actual alcohol consumption, it also incorrectly increases the apparent precision of estimation (i. e. inappropriately small standard errors).
A published study, Pollard et al, (2001) which based its conclusion on one third of the records was replicated after handing missing data by multiple imputation. Multiple imputation by chained equations, an iterative technique, which deals with missing values when every variable is incomplete, was applied. This method greatly improved the results by utilizing most of the information in the incomplete records. The method has the advantage that the algorithm intended for analysing the complete data is applied several times, without any alterations. The implications of missing data were also studied in a survival analysis, investigating the link between incidence of breast cancer and a number of prognostic factors. The thesis recommends multiple imputation for handling
missing data, by which most of the information in the dataset is exploited, and helps in efficient inferences to be made from subsequent analyses.
Publication Year: 2004
Publication Date: 2004-01-01
Language: en
Type: dissertation
Access and Citation
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot