Dissertation: Chapter 4

Chapter 4 Mixed Model with Censored Data

In this section, we deal with mixed models as given in §2.2 with censored data. The model here is formulated in terms of a single vector for each sample unit (e.g. subject), in stead of one vector consisting of all sample units as in (2.6). For each sample unit, define the mixed model as follows. Suppose that each observation vector y has k components, and there are p fixed effects and q random effects. Then y_k×1 for each unit is taken to have model equation

y = Xβ + Zu + e. (4.1)

The p fixed effects on each y are represented by vector β_p×1 with its design matrix X_k×p, and the q random effects by vector u_q×1 with its design matrix Z_k×q.

Meanwhile, as in Chapter 3, we also assume some components of the vector data y were not completely observed. Also, more general than that in §2.2, the variance-covariance matrix of y can be essentially arbitrary.

4.1 Data Specification

To facilitate the general incompleteness mechanism, we will use results in Chapter 3. However, in mixed model, each unit y probably has its own distribution due to different design matrices X and Z, so we use the results in §3.3.3 by assigning each unit different density function.

Given , assume the random effect vector u follows distribution h₂(u|) in some sample space . And given u, we have independent observations of vector y from sample space with density h₁(y|X, Z, u, ), for brevity written as h₁(y|u,).

As in Chapter 3, there are three possible cases for an observed vector y: (a) all components of observations y are completely known, (b) partially observed, some but not all components of y are completely known, (c) all components are censored or totally missing.

We may restate the model (4.1) in terms of each draw as below. For i = 1, 2, ..., n,

y_i = X_i β + Z_iu + e_i. (4.2)

The observed data thus consist of =(y₁⁺, y₂⁺, ..., y_n⁺), where y_i⁺ denotes complete part of y_i. The complete data consist of =(y₁, y₂, ..., y_n, u).

Note that y_i^*, the incomplete part of y_i, is not a part of the observed data. Even y_i^* might contain certain information as y_i^* _i^*, but that information is reflected in the many-to-one mapping x y(x) from to as in §2.1 by the fact that (y⁺) = {y⁺}× ^* as in equation (2.1). This conditional information is built in as we expand the sampling space from to .

The complete data sampling density from (3.4) is

ƒ(|)= h₁(y_i|u,) h₂(u|). (4.3)

Based on Chapter 3, the observed data sampling density given u is

g(|u, )= h₁(y_i|u,) dy_i. (4.4)

Then the sampling density for the observed data is

(4.5)

To calculate conditional expectations, we need the following two notations for the conditional distribution densities. The first is that of y given y and u, as

, (4.6)

and the second is for u given and , denote it by k₂(u|,),

(4.7)

Note that the relationship between these two conditional density functions with the one for complete data given observed data defined in equation (2.3) is as follows,

k(|, )= k₁(y_i|u, ) k₂(u|,). (4.8)

The function Q(|^(p)) can be computed as

(4.9)

Note for given , sampling space () = as u's are not observable.

As log ƒ(|) = log h₁(y_i|u, ) + log h₂(u|), the expectation over censoring

From (4.9 ), we may formulate the general EM algorithm to maximize the observed likelihood. However, the key component in the function is the sampling density h₁(y|u, ). Now we illustrate the algorithm with more specific model assumptions.

4.2 Distributions in the Exponential Families

4.2.1 Parameter Representations

Suppose that both y|u and u are from the exponential families as in the form of (2.5) with parameters ₁' and ₂' respectively. Now we have the parameter vector in the form of (2.5) as = (₁', ₂')', which is assumed not related to u. The distribution of y|u does vary with u's, but we assume the parameter representation in the form of (2.5) does not depend on u, while let the other parts t(.), a(.), b(.) vary with u.

In the parameter representation of the form (2.5), its components might be related to each other. For example, ₁ might be a function of covariance of y|u as

Therefore, let θ₁ be a parameter vector which consists of all distinct parameter components from ₁ and θ₂ from ₂. Now the parameter vector of interest is θ = (θ₁', θ₂')', not a function of u. For simplicity, assume they are not functionally related to each other. Note that the representation in terms of θ is not unique. For instance, in the example above θ₁ could be

Now is in fact a function of θ, it is better to denote the functions by ₁(θ₁) and ₂(θ₂) to link these two parameter representations. For notational convenience, we may simply use instead of (θ), also use and θ interchangeably as parameter in many contexts.

Under this parameter representation, by changing the function (θ), we will have great flexibility to constrain parameter components, or to specify various covariance structures. Here we have essentially generalized the exponential families with respect to the conditional distribution.

Definition 4.1 Assume the Jacobian matrices exist for and θ. Define

Δ₁ = and Δ₂ = . (4.10)

Note that Δ₁ = Δ₁(θ₁) and Δ₂ = Δ₂(θ₂). See Appendix for vector differentiation.

4.2.2 EM Algorithm Implementation

Suppose that the density functions h₁(y|u, ₁) and h₂(u|₂) are

(4.11)

where

Then

Since b_1i(y|u) and b₂(u) do not involve (or θ), neither do their conditional expectations given and ^(p), we may simply drop them from Q(|^(p)) while obtain the same ^(p+1) in the M-step . Therefore, the simplified Q(|^(p)) is

(4.12)

Lemma 4.1 Under general regularity,

(4.13)

Proof. Under general regularity, due to the Sundberg's formulae (Sundberg, 1974),

then

Hence,

(4.14)

These are simple transformations of the desired equations.

Theorem 4.1 The EM algorithm for mixed model with incomplete data from regular exponential families can be defined as follows.

Step 0. Start with values for parameters ₁⁽⁰⁾ and ₂⁽⁰⁾. Set p = 0.

Step 1. (E-step) Calculate

Step 2. (M-step) Determine ^(p+1) by ₁^(p+1) = ₁(θ₁^(p+1)) as the solution of
and ₂^(p+1) = ₂(θ₂^(p+1)) as the solution of
Δ₂(θ₂) E(t₂(u)|₂) = Δ₂(θ₂) t₂^(p).

Step 3. If converged, set = ^(p+1); otherwise increase p by unity and repeat.

As in Theorem 3.2, a Monte Carlo EM implementation would be

Theorem 4.2 The MCEM algorithm for mixed model with incomplete data from regular exponential families can be defined as follows.

Step 0. Start with values for parameters ₁⁽⁰⁾ and ₂⁽⁰⁾. Set p = 0.

Step 1. Select simulation sample size m and draw a sample of as ⁽¹⁾, ..., ^(m) from k(|, ^(p)), where for j = 1, ..., m.

Step 2. Determine ^(p+1) by finding ₁^(p+1) = ₁(θ₁^(p+1)) as the solution of
and ₂^(p+1) = ₂(θ₂^(p+1)) as the solution of

Step 3. If converged, set = ^(p+1); otherwise return to step 1 with p +1.

In the Monte Carlo step 1, if y_i is complete, we do not need to draw any sample as then always k₁(y_i|u, ₁)=1. In other words, just let all y_i^(j)=y_i. Similarly, if y_i only has some components incomplete, we draw only those incomplete components according to k₁(y_i|u^(j), ₁^(p)) while keep complete components intact. Also in fact the simulation sample size m could be different from iteration to iteration, which is actually desired in practice. In the Monte Carlo step 2, these equations usually have to be solved by numerical methods as they are often nonlinear for advanced models under general distributions. However, for multivariate normal distributions, sometimes we can have closed form solutions as shown in Chapter 7.

4.2.3 Asymptotic Dispersion Matrix of MLE

Parallel to the development of Lemma 3.5, we have the following derivatives,

Lemma 4.2.

Note that

However, under current setting, it is of interest to have asymptotic dispersion matrix for in stead of . By equation (4.11), easy to know

Theorem 4.3. Assume the dimensions for θ₁ and θ₂ are r₁ and r₂ respectively.

Combine this result with (2.8) to get Fisher information matrix and estimation for the variance of = (₁',₁')'. Also apply Lemma A.5 to log a_1i(₁|u) and log a₂(₂), we have

4.3 Multivariate Normal Distribution

4.3.1 Distribution Parameter Representation

For multivariate normal distribution y ~ N_k(μ, Σ), the density function is

Expressed as in the form of exponential families (4.11), the density function is

h(y|) = b(y) exp('t(y))/a(),

where the form for each component depends on parameter structures. By the direct computation using equation (A.4), it is easy to verify the representations under the following two common situations.

If both μ and Σ are unknown (see the Appendix for matrix operator vec),

and, if μ is known, but only Σ is unknown,

Note that is a function of θ as (θ), and parameter vector θ is formed with all distinct components of .

For the mixed model (4.1), assume that e and u are independent and follow multivariate normal distributions e ~ N_k(0, Σ₁) and u ~ N_q(0, Σ₂), then given u, we have conditional distribution y|u ~ N_k(Xβ + Zu, Σ₁). That is,

where parameter ₁= ₁(θ₁) and θ₁ consists of all distinct parameters within β and Σ₁, i.e. β = β(θ₁), Σ₁= Σ₁(θ₁).

Lemma 4.3. Let μ = Xβ + Zu and Q =(y - Xβ - Zu)' Σ₁^-1(y - Xβ - Zu), then

Proof. Since Q is a scalar, Q = vec(Q ) gives

Again, since this is a scalar, using equation (A.4) to get the desired result.

Lemma 4.4. Each h_1i(y_i|u,₁) can be written as

h₁(y|u,₁) = b₁(y|u) exp(₁'(θ₁) t₁(y|u)) / a₁(₁|u),

where

(4.15)

(4.16)

(4.17)

For the distribution of u ~ N_q(0, Σ₂),

where parameter vector ₂= ₂(θ₂) and θ₂ consists of all distinct parameters of Σ₂, that is Σ₂= Σ₂(θ₂).

Lemma 4.5. h₂(u|₂) can be written in the form of (4.11) as

h₂(u|₂) = b₂(u) exp(₂'(θ₂) t₂(u))/a₂(₂|u),

where

(4.18)

Calculation of the Jacobian Matrix Δ₁(θ₁)

Let

(4.19)

By equation (A.23) in the Appendix, it is easy to show that

(4.20)

From (4.15), we have

Lemma 4.6. For ₁(θ₁) above, the Jacobian Matrix Δ₁(θ₁) is

, (4.21)

where the constant matrix P_1,p,k² is defined in (A.8).

Proof. By the chain rule (A.15), we have the Jacobian matrix defined at (4.10) as

Then by equation (4.19),

Using the matrix P_1,p,k² and by equation (A.12) in the Appendix, we have the result

4.3.2 EM Algorithm Formulation

For notational convenience, now introduce several symbols.

Definition 4.2. Define the following constant matrices,

(4.22)

Definition 4.3. Define the following matrix functions of u,

(4.23)

Definition 4.4. Given k₂ as in (4.7). Define the following conditional expectations

(4.24)

Definition 4.5. Define the following conditional expectations,

(4.25)

Definition 4.6. Let _i = y_i - Z_iu. Define

(4.26)

EM Algorithm Formulation: E-step

By Theorem 4.1, and need to be calculated in the E-step.

Lemma 4.7.

(4.27)

Proof. By (4.16) and (4.23), we have

(4.28)

Then the result is obtained by applying (4.25).

Lemma 4.8.

= - s_u^(p). (4.29)

Since t₂(u) = - ½ u u, the result holds directly by definition equation (4.24).

As shown above, the core calculation involved is the conditional expectations for y_i and y_i y_i over incomplete observations as y_i^*_i^*. In general,

(4.30)

(4.31)

EM Algorithm Formulation: M-step

By Theorem 4.1, and E(t₂(u)|₂) need to be calculated.

Lemma 4.9.

E(t₂(u)|₂) = - vec(Σ₂ ). (4.32)

Proof. Since E(uu'|₂) = Σ₂, we have the result

E(t₂(u)|₂) = - vec (E(uu'|₂)) = - vec(Σ₂ ).

Lemma 4.10.

Proof. Recall in equation (4.16), we have

Since E(y|u,₁) = Xβ + Zu and E(yy'|u,₁) = Σ₁ + (Xβ + Zu)(Xβ + Zu)', then

and recall that X is of order k × p, then

Therefore we have the result,

Lemma 4.11.

Proof. By previous Lemma, and equations (4.22) and (4.25),

Lemma 4.12. For any matrices A_1×k² and B_p×k²,

Proof. By property (A.10) of matrix P_1,p,k² defined at (A.8) in the Appendix,

From equation(4.21), we know that

Repeatedly applying (A.4) yields

Since A + β 'B is a row vector and Bvec(Σ₁^-1) is a (column) vector, we have

Lemma 4.13. The M-step seeks the solution of ₁ = ₁^(p+1) = ₁(θ₁^(p+1)) from

. (4.33)

where Δ_{σ ^-} and Δ_β are defined as in equation (4.19).

Proof. By Theorem 4.1, the M-step requires to find ₁^(p+1) or equivalent θ₁^(p+1) as the solution of the following equations,

By Lemma 4.11 and Lemma 4.7, that is equivalent to

Recall that β = β(θ₁) and Σ₁= Σ₁(θ₁), so need to solve for ₁ = ₁^(p+1) from

(4.34)

Now apply Lemma 4.12 to both sides of equation (4.34) above. First note that due to (A.4) and (A.1), . The left hand side of (4.34) is

By equations (A.1) and (A.4), it is easy to verify that

and

That is, . Therefore the left hand side of (4.34) is

and similarly the right hand side is

So the M-step seeks the solution of ₁ = ₁^(p+1) from

as expected.

Note that the second matrix in the bracket above is in fact, by (A.4) and (A.6), equal to the conditional expectation of the following matrix given and ₁^(p),

(4.35)

EM Algorithm Formulation: Procedure

Depending on different parameter representations by specifying various Jacobian matrices Δ₁ and Δ₂ at (4.10), the following EM-algorithm is applicable to a very broad class of mixed models. In particular, it can handle different covariance structures. These matrices can be calculated as functions of θ (see equation (4.19) for Δ₁). Recall,

Based on Lemmas above, we can summarize the EM-algorithm as follows.

Theorem 4.4 The EM algorithm for mixed model with incomplete data from Normal distribution is defined below. Note that β = β(θ₁), Σ₁= Σ₁(θ₁) and Σ₂= Σ₂(θ₂).

Step 0. Start with values for parameters ₁⁽⁰⁾ and ₂⁽⁰⁾. Set p = 0.

Step 1. (E-step) Calculate conditional expectations as given in (4.26) and (4.24),

(4.36)

Step 2. (M-step) Determine ^(p+1) by ₁^(p+1) = ₁(θ₁^(p+1)) as the solution of (4.33)

, (4.37)

and ₂^(p+1) = ₂(θ₂^(p+1)) as the solution of (see equations (4.29) and (4.32))

Δ₂(θ₂)vec(Σ₂) = Δ₂(θ₂) s_u^(p). (4.38)

Step 3. If converged, set = ^(p+1); otherwise increase p by 1 and return to step 1.

A Monte Carlo implementation of the EM algorithm would differ only in E-step.

Theorem 4.5 Monte Carlo E-step is given as

(1.1) Select simulation sample size m and draw a sample of as ⁽¹⁾, ⁽²⁾, ..., ^(m) from k(|, ^(p)), where for j = 1, ..., m.

(1.2) Calculate

(4.39)

Plug them into M-step to get ^(p+1), then the conditional distributions are updated using ^(p+1), and the algorithm is iterated until it converges.

Again, the drawing mechanism in the Monte Carlo step (1.1) is a general statement. If y_i is complete, we in fact do not need to draw any sample and let all y_i^{( j)} = y_i. Similarly, if y_i only has some components incomplete, we draw only those incomplete components according to k₁(y_i|u^{( j)},₁^(p)) while keep complete components intact.

4.3.3 Asymptotic Dispersion Matrix of MLE

Recall that the following Jacobian matrices are defined previously as

By (4.17), . Apply equation (A.27) to || and by equation (4.19),

Denote μ = Xβ + Zu. For the quadratic form

The last step above is due to equation (A.4) and the fact X'μ is a column vector and its transpose μ'X is a row vector. Therefore the first derivative is

(4.40)

For the second derivative, first calculate the following two parts using Lemma A.4.

And

Hence the second derivative is

(4.41)

For h₂(u), from Lemma 4.5 we have . Apply equation (A.27) to ||, then the first derivative is

(4.42)

And the second derivative is

(4.43)

Put these derivative calculations into Theorem 4.3 and then use the result in (2.8) to get Fisher information matrix and estimation for the variance of = (₁',₁')'.

These results are complicated as some non-trivial results are implied here. Taking θ = will yield the expected vector and covariance matrix for sufficient statistics t in any dimension. For example, E(t₂)= -½ vec(Σ₂) and var(t₂)= ½ Σ₂ Σ₂, which gives var(vec(uu'))= 2Σ₂ Σ₂ when u ~ N_q(0,Σ₂). On the other hand, each component of θ is usually from either mean vector or variance matrix, but not both. It implies that and are often 0 .

Step 0.	Start with values for parameters ₁⁽⁰⁾ and ₂⁽⁰⁾. Set p = 0.
Step 1.	(E-step) Calculate
Step 2.	(M-step) Determine ^(p+1) by ₁^(p+1) = ₁(θ₁^(p+1)) as the solution of and ₂^(p+1) = ₂(θ₂^(p+1)) as the solution of Δ₂(θ₂) E(t₂(u)\|₂) = Δ₂(θ₂) t₂^(p).
Step 3.	If converged, set = ^(p+1); otherwise increase p by unity and repeat.