1. 程式人生 > 其它 >Notes: Principles of Econometrics / Statistical Method in Economics

Notes: Principles of Econometrics / Statistical Method in Economics

SME Note 2

This note is the note for statistical method in economics course. Made for quick check of important formulae.

kw: xi, residual

\(\sum_{i=1}^n x_i e_i=0\), the sum of the weighted (by \(x_i\) ) residuals is 0 :

\[\begin{aligned} \sum_{i=1}^n x_i e_i &=\sum_{i=1}^n x_i\left(y_i-b_1-b_2 x_i\right)=\sum_{i=1}^n x_i y_i-n \bar{x} b_1-b_2 \sum_{i=1}^n x_i^2 \\ &=\sum_{i=1}^n x_i y_i-n \bar{x}\left(\bar{y}-b_2 \bar{x}\right)-b_2 \sum_{i=1}^n x_i^2 \\ &=\sum_{i=1}^n x_i y_i-n \bar{x} \bar{y}+b_2 n \bar{x}^2-b_2 \sum_{i=1}^n x_i^2 \\ &=S_{x y}-b_2 S_{x x}=0 \end{aligned} \]

kw: xi, residual

\(x_i\) and \(e_i\) are uncorrelated (in samples):

\[\operatorname{Cov}\left(x_i, e_i\right)=\frac{1}{n-1} \sum_{i=1}^n\left(x_i-\bar{x}\right)\left(e_i-\bar{e}\right)=\frac{1}{n-1}\left(\sum_{i=1}^n x_i e_i-n \bar{x} \bar{e}\right)=0 \]

kw: yhat, residual

\(\sum_{i=1}^n \hat{y}_i e_i=0\), the sum of the weighted (by \(\hat{y}_i\)

) residuals is 0 :

\[\sum_{i=1}^n \hat{y}_i e_i=\sum_{i=1}^n\left(b_1+b_2 x_i\right) e_i=b_1 n \bar{e}+b_2 \sum_{i=1}^n x_i e_i=0 \]

\(\hat{y}_i\) and \(e_i\) are uncorrelated (in samples):

\[\operatorname{Cov}\left(\hat{y}_i, e_i\right)=\frac{1}{n-1} \sum_{i=1}^n\left(\hat{y}_i-\overline{\hat{y}}\right)\left(e_i-\bar{e}\right)=\frac{1}{n-1}\left(\sum_{i=1}^n \hat{y}_i e_i-n \overline{\hat{y}} \bar{e}\right)=0 ; \]

kw: le, linear estimators, variance, covariance

variances and covariances of b1 and b2
are given by

\[\begin{aligned} \operatorname{Var}\left(b_1\right) &=\frac{\sigma^2 \sum_{i=1}^n x_i^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2} ; \\ \operatorname{Var}\left(b_2\right) &=\frac{\sigma^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2} ; \\ \operatorname{Cov}\left(b_1, b_2\right) &=\frac{-\bar{x} \sigma^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}, \end{aligned} \]

kw: sst, ssr, sse, r2, coefficient of determination, sample correlation coefficient

\[ SST = SSR + SSE \] \[R^2=\frac{\mathrm{SSR}}{\mathrm{SST}}=1-\frac{\mathrm{SSE}}{\mathrm{SST}} \]
  • \(R^2=r_{x y}^2\) : The coefficient of determination \(R^2\) is algebraically equal to the square of the sample correlation coefficient \(r_{x y}\) between \(x\) and \(y\). This result is valid in simple linear regression models;
  • \(R^2=r_{y \hat{y}}^2\) : The coefficient of determination \(R^2\) can also be computed as the square of the sample correlation coefficient between \(y\) and \(\hat{y}\). In this case, it measures the "goodness-of-fit" between the sample data and their predicted values. Therefore, \(R^2\) is sometimes called a measure of "goodness-of-fit". This result is valid not only in simple linear regression models but also in multiple linear regression models.

kw: unbiased sample variance

\[\sigma_{\bar{y}}^2=\frac{n}{n-1} \sigma_y^2=\frac{1}{n-1} \sum_{i=1}^n\left(y_i-\bar{y}\right)^2=\frac{S_{y y}}{n-1}, \]

kw: mean squared error, mse

The mean squared error (MSE):

\[\hat{\sigma}^2 \equiv \sigma_{\hat{y}}^2=\frac{1}{n-2} \sum_{i=1}^n\left(y_i-\hat{y}_i\right)^2=\frac{1}{n-2} \sum_{i=1}^n e_i^2=\frac{S_{y y}-2 b_2 S_{x y}+b_2^2 S_{x x}}{n-2} \]

In this case, we have to divide by \(n-2\), because we estimated the unknown population intercept \(\beta_1\) and the population slope \(\beta_2\) by \(b_1\) and \(b_2\), respectively, which "costs us two degrees of freedom";

  • The unbiased sample covariance can be similarly defined as
\[\sigma_{\bar{x} \bar{y}}^2=\frac{1}{n-1} \sum_{i=1}^n\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)=\frac{S_{x y}}{n-1}, \]

why is the denominator being \(n-1\) rather than \(n-2\) ?

? because no \(\hat y_i\) contained ?

kw: standard errors, se, b1, b2

\[\begin{aligned} &\operatorname{se}\left(b_1\right)=\sqrt{\widehat{\operatorname{Var}}\left(b_1\right)}=\left[\frac{\hat{\sigma}^2 \sum_{i=1}^n x_i^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right]^{1 / 2}, \\ &\operatorname{se}\left(b_2\right)=\sqrt{\widehat{\operatorname{Var}}\left(b_2\right)}=\left[\frac{\hat{\sigma}^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right]^{1 / 2}, \end{aligned} \]

kw: Adjusted R Square, radj, r2
Adjusted R Square

\[\bar{R}^2=1-\frac{\left(1-R^2\right)(n-1)}{n-K} \]

kw: p-value

It is a standard practice to report the probability value of the test (i.e., the \(p\)-value) when reporting the outcome of statistical hypothesis tests. If we have the \(p\)-value of a test, \(p\), we can determine the outcome of the test by comparing the \(p\)-value to the chosen level of significance, \(\alpha\), without looking up or calculating the critical values. The \(p\)-value rule suggests to reject the null hypothesis when the \(p\)-value is less than, or equal to, the level of significance, \(\alpha\). That is, reject \(H_0\) if \(p \leq \alpha\) whereas do not reject \(H_0\) if \(p>\alpha\).

kw: forecast error

\[f=y_0-\hat{y}_0=\left(\beta_1+\beta_2 x_0+\epsilon_0\right)-\left(b_1+b_2 x_0\right) \]

kw: Least squares prediction, f, ci for f, variance of forecast error

\[\operatorname{Var}(f)=\operatorname{Var}\left(y_0\right)+\operatorname{Var}\left(\hat{y}_0\right)-2 \operatorname{Cov}\left(y_0, \hat{y}_0\right) \]

Taking into account that \(x_0\) and the unknown parameters \(\beta_1\) and \(\beta_2\) are not random, we have

\[\operatorname{Var}(f)=\operatorname{Var}\left(\epsilon_0\right)+\operatorname{Var}\left(\hat{y}_0\right)=\sigma^2+\operatorname{Var}\left(\hat{y}_0\right) \]

where

\[\begin{aligned} \operatorname{Var}\left(\hat{y}_0\right)=& \operatorname{Var}\left(b_1+b_2 x_0\right)=\operatorname{Var}\left(b_1\right)+x_0^2 \operatorname{Var}\left(b_2\right)+2 x_0 \operatorname{Cov}\left(b_1, b_2\right) \\ =& \frac{\sigma^2 \sum_{i=1}^n x_i^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+x_0^2 \frac{\sigma^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+2 x_0 \frac{-\sigma^2 \bar{x}}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2} \\ =& {\left[\frac{\sigma^2 \sum_{i=1}^n x_i^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}-\frac{\sigma^2 n \bar{x}^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] } \\ &+\left[x_0^2 \frac{\sigma^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+2 x_0 \frac{-\sigma^2 \bar{x}}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+\frac{\sigma^2 n \bar{x}^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] \\ =& \sigma^2\left[\frac{\sum_{i=1}^n x_i^2-n \bar{x}^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+\frac{x_0^2-2 x_0 \bar{x}+\bar{x}^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] \\ =& \sigma^2\left[\frac{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+\frac{\left(x_0-\bar{x}\right)^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] \\ =& \sigma^2\left[\frac{1}{n}+\frac{\left(x_0-\bar{x}\right)^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] . \end{aligned} \]

Therefore, by replacing \(\sigma^2\) by its estimate \(\hat{\sigma}^2\), the estimated variance of the forecast error is given by

\[\widehat{\operatorname{Var}}(f)=\hat{\sigma}^2\left[1+\frac{1}{n}+\frac{\left(x_0-\bar{x}\right)^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] \]

the square root of which is the standard error of the forecast error

\[\operatorname{se}(f)=\sqrt{\widehat{\operatorname{Var}}(f)} \]