Dimensionality Reduction (요약)

 1. Feature extraction / Dimensionality reduction
  • Given data points in d dimensions
  • Convert them to data points in k(<d)  dimensions
  • With minimal loss of information

2. Principal Component Analysis (PCA)
  • Find k-dim projection that best preserves variance
  • Process
    1. compute mean vector \mu and covariance matrix \Sigma of original data \Sigma = \frac{1}{N}\sum_{i=1}^N (x_i - \mu)(x_i-\mu)^T
    2. Compute eigenvectors and eigenvalues of \Sigma \Sigma v = \lambda v
    3. Select largest k eigenvalues (also eigenvectors)
    4. Project points onto subspace spanned by them y = A(x - \mu)
  • Eigenvector with largest eigenvalue captures the most variation among data X
  • We can compress the data by using the top few eigenvectors (principal components)
  • Feature vector y_k's are uncorrelated (orthogonal)

3. Linear Discriminant Analysis (LDA)
  • PCA vs. LDA
    • PCA does not consider "class" information
    • LDA consider "class" information
    • PCA maximizes projected total scatter
    • LDA maximizes ratio of projected between-class to projected within-class scatter
  • Within-class scatter (want to minimize) \Sigma_w = \sum_{j=1}^c \frac{1}{N_c}\sum_{i=1}^{N_c} (x_i -\mu_c)(x_i -\mu_c)^T
  • Between-class scatter (want to maximize) \Sigma_b = \frac{1}{c}\sum_{i=1}^c (\mu_i -\mu)(\mu_i -\mu)^T
  • Compute eigenvectors and eigenvalues \frac{\Sigma_b}{\Sigma_c}v = \lambda v















댓글

이 블로그의 인기 게시물

One-Class SVM & SVDD

Support Vector Regression (SVR)

Support Vector Machine (SVM) - Soft margin