Dimensionality Reduction (요약)
1. Feature extraction / Dimensionality reduction
- Given data points in d dimensions
- Convert them to data points in k(<d) dimensions
- With minimal loss of information
2. Principal Component Analysis (PCA)
- Find k-dim projection that best preserves variance
- Process
- compute mean vector \mu and covariance matrix \Sigma of original data \Sigma = \frac{1}{N}\sum_{i=1}^N (x_i - \mu)(x_i-\mu)^T
- Compute eigenvectors and eigenvalues of \Sigma \Sigma v = \lambda v
- Select largest k eigenvalues (also eigenvectors)
- Project points onto subspace spanned by them y = A(x - \mu)
- Eigenvector with largest eigenvalue captures the most variation among data X
- We can compress the data by using the top few eigenvectors (principal components)
- Feature vector y_k's are uncorrelated (orthogonal)
3. Linear Discriminant Analysis (LDA)
- PCA vs. LDA
- PCA does not consider "class" information
- LDA consider "class" information
- PCA maximizes projected total scatter
- LDA maximizes ratio of projected between-class to projected within-class scatter
- Within-class scatter (want to minimize) \Sigma_w = \sum_{j=1}^c \frac{1}{N_c}\sum_{i=1}^{N_c} (x_i -\mu_c)(x_i -\mu_c)^T
- Between-class scatter (want to maximize) \Sigma_b = \frac{1}{c}\sum_{i=1}^c (\mu_i -\mu)(\mu_i -\mu)^T
- Compute eigenvectors and eigenvalues \frac{\Sigma_b}{\Sigma_c}v = \lambda v
댓글
댓글 쓰기