Basic information of Statistics

이번 글에서는 통계학에서 기본적으로 사용되는 용어, 개념들을 정리하였다.

  • Sample space (표본공간)
: The sample space $S$ is a set that contains all possible experimental outcomes.

  • Experiment (실험)
: Any process for which more than one outcome is possible. (Any process that generates data)

  • Event (사건)
: A subset of the sample space $S$

  • Random variable
: A function that assigns real number to each element of sample space.
$$Real\;numbers = f(Elements\;of\;the\;sample\;space)$$



확률 변수의 종류

1. Discrete random variables (이산확률변수)
: Outputs of a random variable are finite discrete (countable) values. (0, 1, 2, ...)

2. Continuous random variables (연속확률변수)
: Outputs of a random variable are continuous (uncountable) values.



  • Probability function ($f$)
$$p = f(x)$$
$$x\;:\;Real\;numbers\;that\;the\;Random\;variables\;can\;take$$
$$0 \leq p \leq 1$$

$x$가 이산확률변수이면, $f$는 Probability mass function (p.m.f. : 확률질량함수) 이고
$x$가 연속확률변수이면, $f$는 Probability density function (p.d.f. : 확률밀도함수) 이다.



Probability mass function (pmf : 확률 질량 함수)

  • For a discrete random variable $X$
  • Let $x$ be a possible value of $X$
  • The probability that is assigned to the values of $X$, $P[X=x]$
  • A discrete random variable $X$ has probability mass function (p.m.f.)
$$f(x) = P[X = x]$$



Expectation (E, mean)

For a discrete random variable $X$ with p.m.f. $p(x)$
$$E[X] = \sum_i x_i p (x_i) $$
$E[X]$ is an expected value of a random variable $X$.
More precisely, $E[X]$ is weighted average of the possible values of $X$, where each value is weighted by the probability that it will occur.

  • $E[c] = c,\;\;\;\;c\;:\;constant$
  • $E[cX] = cE[x]$
  • $E[cX+d] = cE[x] + d,\;\;\;\;d\;:\;constant$
  • $E\big[\sum_{i=1}^n X_i \big] = \sum_{i=1}^n E[X_i] $

Variance (V, Var)

For a discrete random variable $X$ with mean $\mu$, the variance of $X$ denoted by $V(X)$ is defined by
$$V[X] = E[(X-\mu )^2] $$
Variance can be interpreted as the expected squared deviation about its mean.
Variance shows how much variation (or dispersion) from the expected value.
Variance cannot be negative.
$$\begin{align*} V(X) &= E[(X-E[X] )^2]\\ &= E[X^2 - 2XE[X] + E[X]^2 ] \\ &= E[X^2] - 2E[X]^2 - E[X]^2 \\ &= E[X^2] - E[X]^2 \end{align*}$$

  • $V(c) = 0$
  • $V(cX) = c^2V(X)$
  • $V(cX+d) = c^2V(X)$

The squared root of the $V(X)$ is called the standard deviation of random variable X.
$$SD[X] = \sigma = \sqrt{V(X)} $$



Mean Vectors

Let $X$ and $Y$ be random matrices of the same dimension, and $A$ and $B$ be matrices of constant.
$$E(X) = \begin{bmatrix}E(X_1)\\E(X_1)\\ \vdots \\ E(X_p) \end{bmatrix} = \begin{bmatrix} \mu_1 \\ \mu_2 \\ \vdots \\ \mu_p \end{bmatrix} = \mathbf{\mu} $$

$$E(X+Y) = E(X)+E(Y)$$
$$E(AXB) = AE(X)B$$



Covariance Matrix (공분산 행렬)

$$\begin{align*}\Sigma(X) &= E[(X-\mu)(X-\mu)^T] \\ & \\ &= \begin{bmatrix} E[(X_1-\mu_1)(X_1-\mu_1)] & E[(X_1-\mu_1)(X_2-\mu_2)] & \cdots & E[(X_1-\mu_1)(X_p-\mu_p)]\\ E[(X_2-\mu_2)(X_1-\mu_1)] & E[(X_2-\mu_2)(X_2-\mu_2)] & \cdots & E[(X_2-\mu_2)(X_p-\mu_p)]\\ \vdots & \vdots &  \ddots & \vdots \\ E[(X_p-\mu_p)(X_1-\mu_1)] & E[(X_p-\mu_p)(X_2-\mu_2)] & \cdots &  E[(X_p-\mu_p)(X_p-\mu_p)] \end{bmatrix} \\ & \\ &= \begin{bmatrix} \sigma_{11} & \sigma_{12} & \cdots & \sigma_{1p} \\ \sigma_{21} & \sigma_{22} & \cdots & \sigma_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{p1} & \sigma_{p2} & \cdots &  \sigma_{pp} \end{bmatrix} \end{align*}$$


Correlation Matrix (상관계수 행렬)

$$\begin{align*} \rho_X &= \begin{bmatrix} \frac{\sigma_{11}}{\sqrt{\sigma_{11}}\sqrt{\sigma_{11}}} & \frac{\sigma_{12}}{\sqrt{\sigma_{11}}\sqrt{\sigma_{22}}} & \cdots & \frac{\sigma_{1p}}{\sqrt{\sigma_{11}}\sqrt{\sigma_{pp}}} \\  \frac{\sigma_{21}}{\sqrt{\sigma_{22}}\sqrt{\sigma_{11}}} & \frac{\sigma_{22}}{\sqrt{\sigma_{22}}\sqrt{\sigma_{22}}} & \cdots & \frac{\sigma_{2p}}{\sqrt{\sigma_{22}}\sqrt{\sigma_{pp}}} \\ \vdots &\vdots & \ddots & \vdots \\ \frac{\sigma_{1p}}{\sqrt{\sigma_{11}}\sqrt{\sigma_{pp}}} & \frac{\sigma_{2p}}{\sqrt{\sigma_{22}}\sqrt{\sigma_{pp}}} & \cdots & \frac{\sigma_{pp}}{\sqrt{\sigma_{pp}}\sqrt{\sigma_{pp}}} \end{bmatrix} \\ & \\ &= \begin{bmatrix} 1& \rho_{12} & \cdots & \rho_{1p} \\  \rho_{21} & 1 & \cdots & \rho_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \rho_{p1} & \rho_{p2} & \cdots & 1 \end{bmatrix} \end{align*}$$



Let $X_1$ and $X_2$ are random variables and $a,\,b,\,c$ are constant values.
$$E(cX_1) = cE(X_1) = x\mu_1$$
$$Var(cX_1) = E(cX_1 - c\mu_1)^2 = c^2 Var(X_1) = c^2 \sigma$$
$$Cov(aX_1,bX_2) = ab\,Cov(X_1,X_2)$$

For the linear combination $aX_1 + bX_2$
$$E(aX_1 + bX_2) = aE(X_1) + bE(X_2) = a \mu_1 + b \mu_2 $$
$$\begin{align*} Var(aX_1 + bX_2) &= a^2 Var(X_1) + b^2 Var(X_2) + 2ab \, Cov(X_1,X_2) \\ &= a^2 \sigma_{11} + b^2 \sigma_{22} + 2ab\sigma_{12} \end{align*}$$


With $C^T = [a,b]$, $aX_1 + bX_2$ can be written as
$$aX_1 + bX_2 = \begin{bmatrix} a & b \end{bmatrix} \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} = C^T X $$
$$E(aX_1 + bX_2) = E(C^T X) = C^T E(X) = \begin{bmatrix} a & b \end{bmatrix} \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix} = a \mu_1 + b \mu_2$$
$$ Var(aX_1 + bX_2) = V(C^T X) = C^TV(X)C = a^2 \sigma_{11} + b^2 \sigma_{22} + 2ab\sigma_{12}$$



The linear combination $C^TX = C_1X_1 + C_2X_2 +\cdots + C_pX_p$
$$E[C^TX] = C^T\mu_X$$
$$Var[C^TX] = C^T\Sigma_X C$$


In general, consider the $q$ linear combinations of the $p$ random variables $X_1,X_2,\cdots,X_p$. Let $C$ be a matrix of constant.
$$Z = \begin{bmatrix}Z_1\\Z_2\\\vdots \\ Z_q \end{bmatrix} = \begin{bmatrix} C_{11} & C_{12} & \cdots & C_{1p} \\ C_{21} & C_{22} & \cdots & C_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ C_{q1} & C_{q2} & \cdots & C_{qp} \end{bmatrix} \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_p \end{bmatrix} = CX$$
$$\mu_Z = E(Z) = E(CX) = CE(X) = C \mu_X$$
$$\Sigma_X = Cov(Z) = Cov(CX) = C\,Cov(X)C^T = C \Sigma_X C^T$$



※ 이 글은 고려대학교 산업경영공학과 김성범 교수님의 예측모델 강의를 정리하고, 공부한 내용을 바탕으로 작성되었습니다.

댓글

이 블로그의 인기 게시물

One-Class SVM & SVDD

Support Vector Regression (SVR)

Self-Training & Co-Training