Dissertation: Abstract

The repeated measures model is very important in contemporary research, but it is almost impossible to avoid missing or censored data from such study. For more efficient use of all available information, we have developed a systematic approach for analyzing such data with incomplete values. As we can format the repeated measures model as a special case, our focus is on the general mixed models with exponential family distributions, and multivariate normal distribution in particular. In addition, the covariance and other data structures are handled by the use of related Jacobian matrices so that we can estimate any parameters of interest in a consistent fashion.

The approach is fully parametric on marginal likelihood. Expectations are taken over incomplete measures conditional on observed information. It is conceptually straightforward but computationally challenging. The likelihood function could be extremely complex due to the involvement of very high dimension integrals. In the past, calculation of such integrals was prohibitive for exploiting this type of approach. As computing power been improving rapidly, such calculations now become possible with the proper techniques. In this research, the EM algorithm is employed and implemented by Monte Carlo Markov chains via an augmented Gibbs sampler. The use of latent variables greatly simplifies the sampler. This tactic is ideal for sampling incomplete data in MCEM algorithms. In spirit of this sampler, our methodology can be extended to a much broader class of models to handle incomplete data.

Simulations and real data analyses demonstrate the application and performance of this approach. It works well even when the proportion of incomplete data is very high. The real data results also show improvement over those of conventional method.