\documentclass[12pt,a4paper,notitlepage]{article}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{indentfirst}
\usepackage{geometry}
\usepackage[onehalfspacing]{setspace}
\setcounter{MaxMatrixCols}{10}
%TCIDATA{OutputFilter=LATEX.DLL}
%TCIDATA{Version=5.50.0.2953}
%TCIDATA{}
%TCIDATA{BibliographyScheme=Manual}
%TCIDATA{Created=Tuesday, November 28, 2006 19:50:57}
%TCIDATA{LastRevised=Sunday, December 10, 2006 03:14:17}
%TCIDATA{}
%TCIDATA{}
%TCIDATA{CSTFile=40 LaTeX article.cst}
\geometry{top=1.5in, bottom=1.5in, left=1in, right=1in}
\newtheorem{theorem}{Theorem}
\newtheorem{acknowledgement}[theorem]{Acknowledgement}
\newtheorem{algorithm}[theorem]{Algorithm}
\newtheorem{axiom}[theorem]{Axiom}
\newtheorem{case}[theorem]{Case}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{conclusion}[theorem]{Conclusion}
\newtheorem{condition}[theorem]{Condition}
\newtheorem{conjecture}[theorem]{Conjecture}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{criterion}[theorem]{Criterion}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\newtheorem{exercise}[theorem]{Exercise}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{notation}[theorem]{Notation}
\newtheorem{problem}[theorem]{Problem}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{solution}[theorem]{Solution}
\newtheorem{summary}[theorem]{Summary}
\newenvironment{proof}[1][Proof]{\noindent\textbf{#1.} }{\ \rule{0.5em}{0.5em}}
\input{tcilatex}
\begin{document}
%TCIMACRO{%
%\TeXButton{\begin{titlepage}}{\begin{titlepage}
%}}%
%BeginExpansion
\begin{titlepage}
%
%EndExpansion
%TCIMACRO{%
%\TeXButton{\title{Review econometrics}}{\title{Review econometrics}}}%
%BeginExpansion
\title{Review econometrics}%
%EndExpansion
%TCIMACRO{\TeXButton{\author{Lazy}}{\author{Lazy}}}%
%BeginExpansion
\author{Lazy}%
%EndExpansion
%TCIMACRO{\TeXButton{\maketitle}{\maketitle}}%
%BeginExpansion
\maketitle%
%EndExpansion
%TCIMACRO{\TeXButton{\thispagestyle{empty}}{\thispagestyle{empty}}}%
%BeginExpansion
\thispagestyle{empty}%
%EndExpansion
\begin{center}
{\Large Foreword}
\end{center}
Although I have spent more than 30 hrs to prepare this review notes (average
3hrs per chapter), I cannot promise you this note is error-free. It would be
best to question anything seems unusual or unfamilar.\newline
The first thing I want to share with you is not the technical knowledge of
how to calculate but how to study econometrics efficiently?
\begin{enumerate}
\item Learn the assumption of models
\item Try to understand the proofs (but don't overdo it!)
\item Memorize the main results
\item Remember the limitation, testing, forecasting of the models
\item Do all exercises available! (including past papers!)
\end{enumerate}
Instead of putting the material reference at the back, I decide to put it in
the front cover:
\begin{enumerate}
\item Econometric Analysis (5th edition) by William H. Greene, Prentice Hall
\item Introduction to Linear Regression Analysis (3rd edition) by Douglas C.
Montgomery, Elizabeth A. Peck and G. Geoffrey Vining, Wiley Interscience
Publication
\item Statisical Inference (2nd edition) by Geogre Casella and Roger L.
Berger, Duxbury
\end{enumerate}
Finally, let me wish you good luck in exam!
%TCIMACRO{\TeXButton{\end{titlepage}}{\end{titlepage}}}%
%BeginExpansion
\end{titlepage}%
%EndExpansion
.
\bigskip
%TCIMACRO{\TeXButton{\pagenumbering{roman}}{\pagenumbering{roman}}}%
%BeginExpansion
\pagenumbering{roman}%
%EndExpansion
%TCIMACRO{\TeXButton{\tableofcontents}{\tableofcontents}}%
%BeginExpansion
\tableofcontents%
%EndExpansion
\newpage
\bigskip \textbf{%
%TCIMACRO{\TeXButton{\pagenumbering{arabic}}{\pagenumbering{arabic}}}%
%BeginExpansion
\pagenumbering{arabic}%
%EndExpansion
}
\section{A Review of Probability and Statistics}
\subsection{Sample space, random experiment, events and probability}
Sample~Space$~\overset{\text{Random~Experiment}}{\Longrightarrow }~$
Elementary~Event $\overset{\text{combination}}{\implies }~$Event$~\overset{%
\text{assignment}}{\implies }~$Probability
\begin{enumerate}
\item Sample Space: all possible outcomes of random experiment
\item Random Experiment: (i) all outcome known (ii) particular trial not
known (iii) can be repeated
\item Elementary Event: particular outocme of random experiment
\item Event: elementary event(s), subset of sample space
\item Probability: numbers from 0 to 1 assigned to events
\end{enumerate}
\subsection{Kolmogorov's definition of probability}
This is a technical definition to eliminate any possible paradox and
ambiguity.
\begin{enumerate}
\item $0\leq \Pr \left( E\right) \leq 1$
\item $\Pr (S)=1$
\item $\Pr (A\cup B)=\Pr \left( A\right) +\Pr \left( B\right) $ if $A\cap
B=\varnothing .$
\end{enumerate}
where $S$ is sample space, $E,~A,~B$ are events. $\cup $ and $\cap ~$means
union and intersection. $\varnothing $ means null set.
\subsection{Conditional Probability and Statistical Independence}
This can be thought as reduction in sample space. If we know something has
happened, we know something has not happened. Therefore, the original
probability should be revised. The revision is done by using the concept of
conditional probability.
\begin{equation*}
\Pr \left( A\mid B\right) =\frac{\Pr (A\cap B)}{\Pr \left( B\right) }
\end{equation*}
From conditional probility, we can derive the independence concept. If
realization of event A has no effect on event B, we call the two event
statistically independent. That is,%
\begin{eqnarray*}
&&\Pr \left( A\mid B\right) =\Pr \left( A\right) \text{ and }\Pr \left(
B\mid A\right) =\Pr \left( B\right) \\
&&\left[ \Rightarrow \Pr (A\cap B)=\Pr \left( A\right) \times \Pr \left(
B\right) \right]
\end{eqnarray*}
\subsection{Random Variable}
This is to abstract the concept of events. Here, we usually assume events to
be numbers rather than heads, tails, diamond, club, heart and spade.
Formally, random variable is the method(function) of assigning non-numbers
(events) to numbers numbers (value of random variable).
\begin{center}
Elementary Event$~\overset{\text{random variable}}{\implies }~$Numbers
\end{center}
If numbers representing events are discrete, it is discrete random variable.
If numbers representing events are continuous, it is continuous random
variable.
How can we describe a random variable? Distribution function.
\begin{equation*}
F\left( a\right) =\Pr \left( X\leq a\right)
\end{equation*}
How distribution function correlates probability?
Probability mass function for discrete random variable.
\begin{eqnarray*}
\Pr \left( X=a\right) &=&f\left( a\right) \\
F\left( a\right) &=&\sum_{x\leq a}f\left( x\right)
\end{eqnarray*}
Probability density function for continuous random variable.
\begin{eqnarray*}
\Pr \left( a\leq X\leq b\right) &=&\int_{a}^{b}f\left( x\right) dx \\
F\left( a\right) &=&\int_{-\infty }^{a}f\left( x\right) dx
\end{eqnarray*}
How to complicate a random variable? Add more random variables. Then you
have to use joint distribution to describe.%
\begin{equation*}
F\left( a,b\right) =\Pr \left( X\leq a,Y\leq b\right)
\end{equation*}
Of course, you could get joint density function if both random variables are
continuous where you will need to compute double integration or partial
differentiation to play around with them.
\subsection{Functions on random variable}
\begin{enumerate}
\item Mean (1st moment, mathematical expectation)
Central Tendency: A single number to represent data.%
\begin{equation*}
E(X)=\left\{
\begin{array}{c}
\underset{x\in S}{\sum }xf(x) \\
\int_{S}xf(x)dx%
\end{array}%
\right.
\begin{array}{c}
\text{ if }X\text{ is discrete} \\
\text{if }X\text{ is continuous}%
\end{array}%
\end{equation*}
\item Variance (2nd moment about mean)
Dispersion: A number to describe the spread of data. Accuracy of mean.%
\begin{equation*}
Var\left( X\right) =E(X-E(X))^{2}=\left\{
\begin{array}{c}
\underset{x\in S}{\sum }(x-E\left( X\right) )^{2}f(x) \\
\int_{S}(x-E\left( X\right) )^{2}f(x)dx%
\end{array}%
\right.
\begin{array}{c}
\text{ if }X\text{ is discrete} \\
\text{if }X\text{ is continuous}%
\end{array}%
\end{equation*}
\item Covariance
Relationship between two variables. Positive means moves in the same
direction and negative means move in opposite direction. Zero means no
(linear) relationship.%
\begin{equation*}
Cov\left( X,Y\right) =E\left[ \left( X-E\left( X\right) \right) \left(
Y-E\left( Y\right) \right) \right] =E\left( XY\right) -E\left( X\right)
E\left( Y\right)
\end{equation*}
\item Correlation coefficient
Normalized covariance. Only assumes value from $\left[ 0,1\right] .$The
rescale is done by dividing standard deviation of both variables.%
\begin{equation*}
\rho _{xy}=\frac{Cov\left( X,Y\right) }{\sqrt{Var\left( X\right) }\sqrt{%
Var\left( Y\right) }}
\end{equation*}%
\newpage
\end{enumerate}
\section{Special probability distribution}
This chapter only covers continuous random variables. Bascially, most of all
distributions needed are all variation of normal distribution.
$%
\begin{array}{cccccc}
\text{Normal} &
\begin{array}{c}
\\
Z=\frac{X-\mu }{\sigma } \\
\longrightarrow \\
\\
\end{array}
&
\begin{array}{c}
\text{Standard } \\
\text{Normal}%
\end{array}
&
\begin{array}{c}
\chi _{k}^{2}=\underset{i=1}{\overset{k}{\sum }}Z_{i} \\
\longrightarrow \\
\\
\end{array}
& \text{Chi-square} &
\begin{array}{c}
\text{Student's t} \\
\nearrow t_{k}=\frac{Z}{\sqrt{\chi _{k}^{2}/k}} \\
\\
\searrow F=\frac{\chi _{m}^{2}/m}{\chi _{n}^{2}/n} \\
\text{F}%
\end{array}%
\end{array}%
$
\subsection{Uniform distribution}
By the name, we know its density should be uniform in the sample space. So,
if $X\sim U\left( a\right) $, the density function should be%
\begin{equation*}
f\left( x\right) =\left\{
\begin{array}{c}
\frac{1}{a} \\
0%
\end{array}%
\right.
\begin{array}{c}
\text{if }0\leq x\leq a \\
\text{otherwise}%
\end{array}%
\end{equation*}
Use: to model event that might happen in equally likely manner
Exponential distribution
Again, by the name, we know its density should be in exponential form. So,
if $X\sim $exponential$\left( \theta \right) $, the density function should
be%
\begin{equation*}
f\left( x\right) =\left\{
\begin{array}{c}
\frac{1}{\theta }e^{-x/\theta } \\
0%
\end{array}%
\right.
\begin{array}{c}
\text{if }0\leq x<\infty \\
\text{otherwise}%
\end{array}%
\end{equation*}
Use: to model radioactive decay, time for light bulb to burn out
\subsection{Normal distribution}
First used by de Moivre to approximate binomial distribution for large
number of trial (large n). Later, Laplace and Gauss used this to model the
errors of experiment. Therefore, it is also sometimes referred as Gaussian
distribution. The density function might look intimidating at the first
hand, but it will be much friendly amd `normal' if we look at it more.
If $X\sim N\left( u,\sigma ^{2}\right) ,$the density function will be%
\begin{equation*}
f\left( x\right) =\left\{
\begin{array}{c}
\frac{1}{\sigma \sqrt{2\pi }}\exp \left\{ -\frac{1}{2}\left( \frac{x-\mu }{%
\sigma }\right) ^{2}\right\} \\
0%
\end{array}%
\right.
\begin{array}{c}
\text{if }-\infty \chi _{p}^{2}\left( \alpha \right)
\end{equation*}
\end{enumerate}
\item White test
This test is similar to B-P test except that we don't need to have the
knowledge on what is the formula on variance of error. We assume the error
to be function of explanatory variables. That is, our modelling on variance
of error would be square or multiples of explanatory variables.%
\begin{eqnarray*}
y_{t} &=&\beta _{0}+\beta _{1}x_{1t}+\beta _{2}x_{2t}+...+\beta
_{k}x_{kt}+u_{t} \\
\sigma _{t}^{2} &=&\alpha _{0}+\alpha _{1}x_{1t}+\alpha
_{2}x_{2t}+...+\alpha _{k}x_{kt}+\alpha _{k+1}x_{1t}^{2}+...+\alpha
_{2k}x_{2kt}^{2} \\
&&+\alpha _{2k+1}x_{1t}x_{2t}+...+\alpha _{\frac{k(k+5)}{2}}x_{\left(
k-1\right) t}x_{kt}+v_{t}
\end{eqnarray*}
The hypothesis to be tested is%
\begin{eqnarray*}
H_{0} &:&\alpha _{1}=\alpha _{2}=...=\alpha _{\frac{k(k+5)}{2}}=0 \\
H_{1} &:&\text{at least one of }\alpha _{1},...\alpha _{\frac{k(k+5)}{2}}%
\text{ is not zero}
\end{eqnarray*}
So the procedure is:
\begin{enumerate}
\item regress $y_{t}$ on $x_{1t},x_{2t},...,x_{kt}~$can get OLS estimates
and calculate%
\begin{equation*}
\hat{\sigma}_{t}^{2}=\sum u_{t}^{2}
\end{equation*}
\item regress $\frac{\hat{u}_{t}^{2}}{\hat{\sigma}_{t}^{2}}$ on any
combinations of $x_{1},...,x_{k}$ ,for example, we regress $\frac{\hat{u}%
_{t}^{2}}{\hat{\sigma}_{t}^{2}}$ on explanatory variables $%
x_{1t},x_{2t},...,x_{kt}$ squares of them $x_{1t}^{2},...,x_{2kt}^{2}$ and
cross multiplication of them $x_{1t}x_{2t},...,x_{\left( k-1\right) t}x_{kt}$%
(auxiliary regression)
\item The test is again chi square test with $\frac{k(k+5)}{2}~$degree of
freedom (d.f. depends on number of regressor used in the auxiliary
regression). The testing statistic is equal to unadjusted R-square $R^{2}$
times sample size$~T,$~that is,$~TR^{2}$. That is to reject if
\begin{equation*}
TR^{2}>\chi _{\frac{k(k+5)}{2}}^{2}\left( \alpha \right)
\end{equation*}%
\newpage
\end{enumerate}
\end{enumerate}
\section{Serial Correlation}
Very often, for time series data (data collected on the same object across
time), errors of different period are correlated. This kind of phenomenon is
called autocorrelation. Take stock price as an example, the over-optimism or
irrational exuberance would stimulate the stock price for a certain period
of time but this kind of exogenous psychological factors are usually not
included in the regression model. Hence, the excluded explanatory variables,
implicitly hidden in error terms, are not random across time which lead to
autocorrelation in errors.
\subsection{Consequence}
\begin{enumerate}
\item still unbiased
\item but inconsistent
\item and may not efficient
\end{enumerate}
\subsection{Estimation}
\begin{enumerate}
\item Cochraine-Orcutt Iterative procedure (COIP)
Suppose we know the model is%
\begin{eqnarray*}
y_{t} &=&\beta _{0}+\beta _{1}x_{1t}+...+\beta _{k}x_{kt}+u_{t} \\
u_{t} &=&\rho u_{t-1}+\varepsilon _{t},\text{ }-1<\rho <1
\end{eqnarray*}
After we know the value of $\rho ,$ then we could take the
quasi-differencing on data directly so as to remove the autocorrelation of
the error term. It should be note that the sample size of data would reduce
by 1 as there is nothing to difference for data at initial time. Formally,
the procedure is outline as follows:
\begin{enumerate}
\item Get the value of $\rho .$
\begin{enumerate}
\item If we already know it, we are done.
\item If we don't know it, we are going to estimate it. \
We will have to get the estimate $\hat{\rho}$ by run two regressions which
is very similar to what we have done in last section.
\begin{enumerate}
\item Estimate the original regression and get the estimate $\hat{u}_{t}$%
\begin{equation*}
\hat{u}_{t}=y_{t}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{1t}-...-\hat{\beta}%
_{k}x_{kt}
\end{equation*}
\item Run auxiliary regression on error and lag of error to get the estimate
$\hat{\rho}_{t}$
\begin{equation*}
\text{ }\hat{u}_{t}=\rho \hat{u}_{t-1}+\varepsilon _{t}
\end{equation*}
\end{enumerate}
\end{enumerate}
\item lag the regression equation by one period%
\begin{equation*}
y_{t-1}=\beta _{0}+\beta _{1}x_{1(t-1)}+...+\beta _{k}x_{k\left( t-1\right)
}+u_{t-1}
\end{equation*}
\item multiple by $\rho $ $(or$ $\hat{\rho})$%
\begin{equation*}
\rho y_{t-1}=\rho \beta _{0}+\rho \beta _{1}x_{1(t-1)}+...+\rho \beta
_{k}x_{k\left( t-1\right) }+\rho u_{t-1}
\end{equation*}
\item subract it from original regression equation%
\begin{equation*}
y_{t}-\rho y_{t-1}=\beta _{0}\left( 1-\rho \right) +\beta _{1}\left(
x_{1t}-\rho x_{1(t-1)}\right) +...+\beta _{k}\left( x_{kt}-\rho x_{k\left(
t-1\right) }\right) +\left( u_{t}-\rho u_{t-1}\right)
\end{equation*}
\item estimate this new equation%
\begin{equation*}
y_{t}^{\ast }=\alpha _{0}+\alpha _{1}x_{1t}^{\ast }+...+\alpha
_{k}x_{kt}^{\ast }+u_{t}^{\ast }
\end{equation*}
where quasi-differenced data are $y_{t}^{\ast }=y_{t}-\rho y_{t-1},$ $%
x_{1t}^{\ast }=x_{1t}-\rho x_{1(t-1)},...,x_{kt}^{\ast }=x_{kt}-\rho
x_{k\left( t-1\right) }$
and new coefficients are $\alpha _{0}=\beta _{0}\left( 1-\rho \right) ,\,\
\alpha _{1}=\beta _{1},...,$ $\alpha _{k}=\beta _{k},$ $u_{t}^{\ast
}=\varepsilon _{t}=u_{t}-\rho u_{t-1}.$
\end{enumerate}
\item Hidreth-lu Search procedure
Nothing special, just search and search. This numerical method is not using
regression to estimate the value of $\rho $ but directly changing the value
of $\rho $ to minimize the ESS of the original regression equation.
\end{enumerate}
\subsection{Testing}
\begin{enumerate}
\item Durbin-Watson Test (DW Test)
This is a test to find out whether there is first-order autocorrelation. One
feature (or drawback) of this test is we will have inconclusive region which
means we cannot draw any conclusion about the null and alternative
hypothesis.
The test statistic is%
\begin{equation*}
d=\frac{\underset{t=2}{\overset{T}{\sum }}\left( u_{t}-u_{t-1}\right) ^{2}}{%
\underset{t=1}{\overset{T}{\sum }}u_{t}^{2}}
\end{equation*}
which converge to 2 if there is no autocorrelation.
There is two types of D-W test where the difference is that whether the
alternative hypothesis is residuals are positively correlated or residuals
are negatively correlated. The decision to select the alternative hypothesis
is based on theory or preconception.
\begin{enumerate}
\item Alternative hypothesis is residuals are positively correlated
\begin{eqnarray*}
H_{0} &:&\rho =0 \\
H_{1} &:&\rho <0
\end{eqnarray*}
After we find out $d_{L}$ and $d_{U}$ from table(both depends on the sample
size $T$ and the number of explanatory variable $k$). Our decision rule on $%
H_{0}$ is
\begin{tabular}{|c|c|c|c|c|}
\hline
${\small 0\leq d\leq d}_{L}$ & ${\small d}_{L}{\small 0
\end{eqnarray*}
\begin{tabular}{|c|c|c|c|c|}
\hline
${\small 0\leq d\leq d}_{L}$ & ${\small d}_{L}{\small \chi _{1}^{2}\left( \alpha \right)
\end{equation*}
\end{enumerate}
\end{enumerate}
\newpage
\section{Discrete and Limited dependent variable}
Dummy is forn categorial and nominal explanatory variable. Now, we want to
study how to model the case when the limitation or restriction is on the
explained variable. The discrete means that the value of explained variable
could only assume few possible values such as number of children and limited
means that the value of the explained varible could lie on within some range
numbers such as probability.
\subsection{Problem of using linear model}
\begin{enumerate}
\item out of the acceptable range estimation
\item heteroskedasticity
\item inefficient estimate
\item non-normality of errors
\item problematic explanation of $R^{2}$
\end{enumerate}
\subsection{Probit and logit model}
Since linear model would have so many problems, we have to drop this
simplification. Remember why we do regression?\ Yes, we would to estimate
the functional form $f$ such that $y=f\left( x_{1},x_{2},...,x_{k}\right) .$
In the previous chapters, we use the most basic linear form. However, linear
form places no restriction on the range of the function. (range = set of all
possible value of $y$).
If we put the restriction on the linear function, it is no longer linear.
Hence we have no choice but to adopt other functional forms which would
allow the restriction on possible value of $y.$ For example if $y$ is
probability which could only assume value from zero to one, we then restrict
the family of functions to be set of functions which maps number to $\left[
0,1\right] $ interval. In particular, the most readily used functional form
we have learnt is the probability function.
If the probability function used is normal distribution function, it is
called probit model.
If the probability function used is logistic distribution function, it is
called logit model.
\subsection{Least square Estimation}
In linear regression model, our estimation method is ordinary least square
(OLS) or weighted least square (WLS). Least square(LS) estimation method is
actually an unconstrained minimization.
\begin{equation*}
\underset{\beta _{0},..\beta _{k}}{\min }\underset{t=1}{\overset{T}{\sum }}%
\left[ y_{t}-f\left( x_{1t},x_{2t},...,x_{kt}\right) \right] ^{2}
\end{equation*}
However, as I have said above, when the explanatory variable can only assume
discrete or limited values, the unconstrained optimization would give us
undesirable outcome. Therefore, if we continue to use least square estimate,
we have to do the minimization under constrains like $0\leq f\left(
x_{1},x_{2},...,x_{k}\right) \leq 1$ in probability estimation. That is we
are going to do:\
\begin{eqnarray*}
&&\underset{\beta _{0},..\beta _{k}}{\min }\underset{t=1}{\overset{T}{\sum }}%
\left[ y_{t}-f\left( x_{1t},x_{2t},...,x_{kt}\right) \right] ^{2} \\
&&\text{ \ s.t \ }0\leq f\left( x_{1},x_{2},...,x_{k}\right) \leq 1
\end{eqnarray*}
Remember $f\left( x_{1},x_{2},...,x_{k}\right) $ is not longer linear and
then we would not have this%
\begin{equation*}
f\left( x_{1},x_{2},...,x_{k}\right) =\beta _{0}+\beta _{1}x_{1}+...+\beta
_{k}x_{k}
\end{equation*}
and we might have functional form like this (remember this is uniform
distribution?)%
\begin{equation*}
f\left( x_{1},x_{2},...,x_{k}\right) =\frac{1}{\beta _{0}+\beta
_{1}x_{1}+...+\beta _{k}x_{k}}.
\end{equation*}
or even more complicated functions such as (remmeber this is normal
distribution?)%
\begin{equation*}
f\left( x_{1},x_{2},...,x_{k}\right) =\frac{1}{\sqrt{2\pi }\sigma }\exp %
\left[ -\frac{\left( \beta _{0}+\beta _{1}x_{1}+...+\beta _{k}x_{k}\right)
^{2}}{2\sigma ^{2}}\right]
\end{equation*}
The problem of using differentiation with such complicated function to
obtain a close form solution would lead us into the regime of numerical
optimization. (remember HKCEE bisection method?)
\subsection{Maximium Likelihood Estimate}
Although LS would still provide us a resonable estimation, it is usually not
a good way to deal with probability function estimation. A much better way
to do this kind of estimation is maximum likelihood estimate (MLE).
The principle of MLE is just opposite what we do to calculate the
probability given parameter of probability density function. \ We are going
to estimate the parameter based on "probability". Now, given the data, we
already know which outcome is true. In the other words, if we travel back
through time machine, we actually know which event has actually happened.
Then we could maximize the parameter of the density function so that the
probability function would give the greatest "probability" to the event
realized.
Under the classical definition of probability, outcome of event which has
realized does not fit into the criteria of random experiment. Hence, we
could not call "probability" but to change to new term "likelihood" $L$,
which we try to maximize.
\subsection{Truncated Data}
When we do sampling, it is quite often that some group of population could
not be reached or observed easily. For example, it might be difficult to
observed profit of triad from Inland revenue department data.
Particularly, if variable below certain valueis unobserved, data is called
lower-truncated or truncation from below. On the other hand, if the variable
above certain is unobserved, data is called upper-truncated or truncation
from above.
If we know that certain data has suffer from truncation problem, we have to
change the likelihood function by using conditional probability density
function instead of unconditional one.
\subsection{Censored Data}
Besides truncation, another common data problem is censoring. It means that
we know the value is greater than or smaller than some values but we do not
know its exact value. In the other words, we observe the inequality rather
than equality. For example, if we observe the wage of unemployed, we would
find his wage would be zero. However, if there is no welfare system, he
would still work for living. This means his market wage is below the welfare
payment and strictly larger than zero but we could only observe zero. This
means wages below welfare payment would be censored from the data. As the
censoring point is from below, we call this censor from below. If the point
is from above, then we calll censor from above.
Of course, similar to the truncated case, the likelihood functions would
change if we know the data has been censored. However, we do not need
conditional probability but we only need to put more likelihood
(probability) on the censored point since it has represented a range of
value below it.
\newpage
\section{Simultaneous equations}
\subsection{Interaction between explained and explanatory variable}
Very often, the explaned variable is not completely independent of
explanatory variables. For example, in simple demand-supply framework, price
and quantity can affect each other. It would be violate our assumptions of
independence of $x_{t}$ $(Cov\left( x_{t},u_{t}\right) =0)$ if we are to
estimate the following regression equation following the usual OLS
procedure.
\begin{equation*}
p_{t}=\beta _{0}+\beta _{1}q_{t}+u_{t}
\end{equation*}%
One of the obvious problem is that the above regressio equation ignore the
effect of price on the quantity. That is we assume $\alpha _{0}=\alpha _{1}=0
$ in the following equation.%
\begin{equation*}
q_{t}=\alpha _{0}+\alpha _{1}q_{t}+v_{t}.
\end{equation*}
Therefore, generally, if we want to estimate variables which would
simultaneously affect value of each other, we need to estimate the both
equations at the same time. For example, if we know that $y$ would affect $x$
and vice versa, we need to estimate the following system of simultaneous
equations at one time:%
\begin{equation*}
\left\{
\begin{array}{c}
y_{t}=\beta _{0}+\beta _{1}x_{t}+u_{t} \\
x_{t}=\alpha _{0}+\alpha _{1}y_{t}+v_{t}%
\end{array}%
\right.
\end{equation*}
\subsection{Structural Form v.s. Reduced Form}
How can we estimate two equations at the same time? Remember how we solve
simultaneous equations in alegbra course in secondary school? Yes, we use
substitution method which we try to eliminate one variable by plugging in
equations. Then, why not we repeat the same procedure to eliminate the
simultaneous equation to single equation?
Putting second equation $x_{t}=\alpha _{0}+\alpha _{1}y_{t}+v_{t}$ into the
first equation $y_{t}=\beta _{0}+\beta _{1}x_{t}+u_{t},$ we have%
\begin{equation*}
y_{t}=\beta _{0}+\beta _{1}\left( \alpha _{0}+\alpha _{1}x_{t}+v_{t}\right)
+u_{t}
\end{equation*}
which can be further written as%
\begin{equation*}
y_{t}=\left( \beta _{0}+\beta _{1}\alpha _{0}\right) +\alpha
_{1}x_{t}+\left( v_{t}+u_{t}\right) .
\end{equation*}
Then, we can handle this by our ordinary least squeare estimation method.
However, we should have known that the substitution can be done in another
way. We now put first equation $y_{t}=\beta _{0}+\beta _{1}x_{t}+u_{t}$ into
the second equation $x_{t}=\alpha _{0}+\alpha _{1}y_{t}+v_{t},$ we have%
\begin{equation*}
x_{t}=\alpha _{0}+\alpha _{1}\left( \beta _{0}+\beta _{1}y_{t}+u_{t}\right)
+v_{t}
\end{equation*}
and, after rearranging of terms,%
\begin{equation*}
x_{t}=\left( \alpha _{0}+\alpha _{1}\beta _{0}\right) +\beta
_{1}y_{t}+\left( u_{t}+v_{t}\right) .
\end{equation*}
Again, we use our regression technique to recover the above coefficients.
To sum up, we can estiamte the following equations separately,
\begin{equation*}
\left\{
\begin{array}{c}
y_{t}=\left( \beta _{0}+\beta _{1}\alpha _{0}\right) +\alpha
_{1}x_{t}+\left( v_{t}+u_{t}\right) \\
x_{t}=\left( \alpha _{0}+\alpha _{1}\beta _{0}\right) +\beta
_{1}y_{t}+\left( u_{t}+v_{t}\right)
\end{array}%
\right.
\end{equation*}
using the least square method.
We would call the original equations in structural form since it shows the
structure of relationship between variables while the new equations in
reduced form since it combines all equations
\subsection{Indirect least-squares method}
Remember our target? We can going to estimate structural form equations%
\begin{equation*}
\left\{
\begin{array}{c}
y_{t}=\beta _{0}+\beta _{1}x_{t}+u_{t} \\
x_{t}=\alpha _{0}+\alpha _{1}y_{t}+v_{t}%
\end{array}%
\right.
\end{equation*}
but our method above is going to estimate reduced form equations%
\begin{equation*}
\left\{
\begin{array}{c}
y_{t}=\left( \beta _{0}+\beta _{1}\alpha _{0}\right) +\alpha
_{1}x_{t}+\left( v_{t}+u_{t}\right) \\
x_{t}=\left( \alpha _{0}+\alpha _{1}\beta _{0}\right) +\beta
_{1}y_{t}+\left( u_{t}+v_{t}\right)
\end{array}%
\right.
\end{equation*}
This estimation method called indirect least-squares method (ILS) as we do
not using least square method to estimate the coefficient directly. OLS
calculations would not tell us the value of each coefficients, instead we
could only have the following%
\begin{equation*}
\left\{
\begin{array}{c}
y_{t}=\gamma _{0}+\gamma _{1}x_{t}+\lambda _{t} \\
x_{t}=\delta _{0}+\delta _{1}y_{t}+\omega _{t}%
\end{array}%
\right.
\end{equation*}
Then we have%
\begin{equation*}
\left\{
\begin{array}{c}
\gamma _{0}=\beta _{0}+\beta _{1}\alpha _{0} \\
\gamma _{1}=\alpha _{1} \\
\delta _{0}=\alpha _{0}+\alpha _{1}\beta _{0} \\
\delta _{1}=\beta _{1}%
\end{array}%
\right.
\end{equation*}
which we could recover the value of $\alpha _{1}$ and $\beta _{1}$ but not $%
\beta _{0}$ nor $\alpha _{0}.$ Hence, ILS might not able to recover all you
want. This kind of problem is called identification problem which refers to
our inability to recover all coefficients in structural form.
\subsection{Identification}
How to identifiy all cofficients? It seems like our information is not
enough as the number of unknowns exceeds number of equations. What
information does we need? More variables. In particular, we need variables
specific in each equation. Remember our problem is intercepts of structural
form equations could not be recovered. If we have specific variables, we
then can trace back the intercepts as they remain unchanged even when
specific variables vary.
Now, we know that ILS might suffer from identification problem depending on
the existence of specific variables, which is actually the result of more
unknowns than equations. In theory, it is possible that number of equations
may equal or more than unknowns. So, we have the following identification
condition:
\begin{enumerate}
\item Exact-identification
All parameters can be found and solution is unique.
\item Under-identification
Not all parameter can be found.
\item Over-identification
All parameters can be found but solution is not unique.
\end{enumerate}
\subsection{Order condition v.s. Rank condition}
There are two methods to determine the system is
exact-identified,under-identified or over-identified, though the underlying
principle is the similar.
\begin{enumerate}
\item Order condition
Remember our identification would require specific variables? Hence, the
condition for an equation to be fully identified would require that it
cannot have all variables and must have some variables omitted so that other
equations would have specific variables. \ So, the necessary (not
sufficient)\ condition is
\begin{equation*}
K\geq G-1
\end{equation*}
where $K$ is the number of excluded variables and $G$ is number of equations.
\item Rank condition
Although this condition is much more complicated, it still worth studying it
as it is necessary and sufficient condition for identification.
For matrix $A,$ Rank$\left( A\right) $ is the number of of independent rows
or number of independent columns. In fact, they are always the same. To
check the identification condition, we need to first write the system in
matrix form. That is, if the structural form is%
\begin{equation*}
\left\{
\begin{array}{c}
x_{t}=\alpha _{0}+\alpha _{1}y_{t}+\alpha _{2}a_{t}+v_{t} \\
y_{t}=\beta _{0}+\beta _{1}x_{t}+\beta _{2}a_{t}+\beta _{3}b_{t}+u_{t} \\
z_{t}=\gamma _{0}+\gamma _{1}y_{t}+\gamma _{2}b_{t}+\gamma _{3}c_{t}+\omega
_{t}%
\end{array}%
\right.
\end{equation*}
The matrix form would then be%
\begin{equation*}
\begin{pmatrix}
1 & -\alpha _{1} & 0 \\
-\beta _{1} & 1 & 0 \\
0 & -\gamma _{1} & 1%
\end{pmatrix}%
\begin{pmatrix}
x_{t} \\
y_{t} \\
z_{t}%
\end{pmatrix}%
=%
\begin{pmatrix}
\alpha _{2} & 0 & 0 \\
\beta _{2} & \beta _{3} & 0 \\
0 & \gamma _{2} & \gamma _{3}%
\end{pmatrix}%
\begin{pmatrix}
a_{t} \\
b_{t} \\
c_{t}%
\end{pmatrix}%
+%
\begin{pmatrix}
v_{t} \\
u_{t} \\
\omega _{t}%
\end{pmatrix}%
\end{equation*}
We have to check rank of the combined cofficients matrix:%
\begin{equation*}
\begin{pmatrix}
1 & -\alpha _{1} & 0 & \alpha _{2} & 0 & 0 \\
-\beta _{1} & 1 & 0 & \beta _{2} & \beta _{3} & 0 \\
0 & -\gamma _{1} & 1 & 0 & \gamma _{2} & \gamma _{3}%
\end{pmatrix}%
\end{equation*}
To check whether the first equation if identified or not, we need to:
\begin{enumerate}
\item look at the first row and locate the columns of zero.%
\begin{equation*}
\begin{pmatrix}
1 & -\alpha _{1} & 0 & \alpha _{2} & 0 & 0%
\end{pmatrix}%
\end{equation*}
which has zero at 3rd, 5th and 6th columns
\item take out those columns%
\begin{equation*}
\begin{pmatrix}
0 & 0 & 0 \\
0 & \beta _{3} & 0 \\
1 & \gamma _{2} & \gamma _{3}%
\end{pmatrix}%
\end{equation*}
\item remove the first row%
\begin{equation*}
\begin{pmatrix}
0 & \beta _{3} & 0 \\
1 & \gamma _{2} & \gamma _{3}%
\end{pmatrix}%
\end{equation*}
\item calculate the rank, which is equal to 2 in the above case.
\item If rank = $G-1,$ then the equation is identified, otherwise not.
In the above case, $G-1=3-1=2$ and so it is identified.
\end{enumerate}
Now, if you want to check second or third equation, just follow the same
procedure but change the target row in step a and step c.
\end{enumerate}
\subsection{Two-stage least-squares estimation}
What can we do if the system is under-identified or over-identified?
Fortunately (or might be unfortunate for you), we have another way to do
estimation besides ILS.
Remember our violation of OLS is $Cov\left( x_{t},u_{t}\right) =0.$ So, we
could still use OLS if we substitute $x_{t}$ by another proxy variables. In
two-stage least-squares estimation, we use the reduced form predicted $\hat{x%
}_{t}$ to replace $x_{t}.$ Then we perform OLS on the structural equations
directly. \
Formally, if we wish to estimate this system%
\begin{equation*}
\left\{
\begin{array}{c}
y_{t}=\beta _{0}+\beta _{1}x_{t}+u_{t} \\
x_{t}=\alpha _{0}+\alpha _{1}y_{t}+v_{t}%
\end{array}%
\right.
\end{equation*}
we could first estimate this reduced equation%
\begin{equation*}
x_{t}=\delta _{0}+\delta _{1}y_{t}+\omega _{t}
\end{equation*}
where $\delta _{0}=\alpha _{0}+\alpha _{1}\beta _{0},$ $\delta _{1}=\beta
_{1}$ and $\omega _{t}=u_{t}+v_{t}.$
Then, we could obtain $\hat{x}_{t}$ by
\begin{equation*}
\hat{x}_{t}=\hat{\delta}_{0}+\hat{\delta}_{1}y_{t}+\omega _{t}
\end{equation*}
Since we know our estimation of coefficients would have not changed if order
of variables are reversed (as long as those assumptions are still
satisfied), we can estimate this system%
\begin{equation*}
\left\{
\begin{array}{c}
y_{t}=\beta _{0}+\beta _{1}x_{t}+u_{t} \\
y_{t}=-\frac{\alpha _{0}}{\alpha _{1}}+\frac{1}{\alpha _{1}}x_{t}+\omega _{t}%
\end{array}%
\right.
\end{equation*}
and do the OLS estimation by replacing $x_{t}$ by $\hat{x}_{t}.$%
\begin{equation*}
\left\{
\begin{array}{c}
y_{t}=\beta _{0}+\beta _{1}\hat{x}_{t}+u_{t} \\
y_{t}=-\frac{\alpha _{0}}{\alpha _{1}}+\frac{1}{\alpha _{1}}\hat{x}%
_{t}+\omega _{t}%
\end{array}%
\right.
\end{equation*}
Then we could recover all coefficients $\alpha _{0},\alpha _{1},\beta _{0}$
and $\beta _{1}.$
\bigskip \bigskip \vspace{1.6in}
\begin{center}
{\LARGE Congratulation!\medskip \bigskip }
{\Large You have reached the last page!\medskip \bigskip }
{\large Thank you for watching!\bigskip \bigskip }
\texttt{Lazy Production @ 2006}
\end{center}
\end{document}
               (
geocities.com/hk/s025042)                   (
geocities.com/hk)