# Feature selection
The problem is exponentially hard since we have to consider all combinations of features.
L refers to learning algorithm.
# Filtering
- Faster speed
- Ignores the learning problem
- Isolated features, fails to account cases where a feature might prove valuable when combined with another feature to solve a particular learning problem
# Wrapping
- Super slow
- Takes into account model bias and learning
# Criterion to use for filtering
- Information gain (like in Decision Trees)
- Variance/Entropy/Gini Index
- Independent features / non-correlated features
# Wrapping techniques
- Randomized optimization
- Forward search
- Consider all features one-by-one and see which performs best with the learner.
- Add that feature to the bucket of selected features (initially empty)
- Take the bucket of selected features and pair remaining features one-at-a-time to find out addition of which remaining feature gives the best outcome.
- Take that best performing feature and repeat from step 2, until you don't see any significant boost of scores.
- Backward search
- Consider all combinations of all-except-one features with the learner and see which leads to the least loss.
- Eliminate that "except-one" feature.
- Take the remaining features and repeat from step 1, until you don't start seeing significant loss.
# Feature Relevance
# Feature Usefulness
# Feature transformation
# Principal Component Analysis (PCA)
PCA is about looking at correlation and maximizing variance so reconstruction is possible.
# Independent Components Analysis (ICA)
ICA is about looking at independence.
Blind source separation / Cocktail party problem is an example of what it can solve.
# PCA vs ICA
Mutually orthogonal: PCA (This is what makes PCA global algorithm)
Mutually independent: ICA
Maximal variance: PCA
Maximal mutual information: ICA
Ordered features: PCA
Bag of features: ICA, PCA
ICA is highly directional, whereas PCA is not.
- Therefore, ICA is great for sound waves where direction is important,
- ICA is great for images like faces where direction is important. ICA ends up detecting noses, eyes, contours, etc whereas PCA finds brightness, average face, so basically anything global.
- For pictures of the world, ICA detects edges, whereas PCA still behaves the same as for faces above.
- For documents, ICA gives topics.
# Random Components Analysis (RCA)
Generates random directions.
Big advantage of RCA: Fast.
# Linear Discriminant Analysis (LDA)
Finds a projection that discriminates based on the label.