# Useful Links
- Decision Trees (Part 1)- Tom Mitchell
- Decision Trees (Part 2)- Tom Mitchell
- George's notes
- Bias and Variance Neat Explanation
- Nice SVM Explanation
# Supervised learning
Function Approximation - taking a set of training examples, coming up with a function that generalizes to cases beyond the data we've seen.
# Induction & Deduction
- Induction - Going from examples to general rules
- Deduction - Going from general rules to specific examples
# Unsupervised learning
Coming up with grouping and summarizing based on the training examples without any specific labels.
# Reinforcement Learning
Learning based on rewards with delayed rewards. Like playing a game without knowing the rules. Looks like function approximation in supervised learning, but instead of X and Ys, it will be X and Z(rewards) to find Y.
# Optimization
- Supervised learning: Labels data well
- Unsupervised learning: cluster scores well
- Reinforcement learning: behavior scores well
# Scoring
- For classification problems with imbalanced dataset, F1 score should be used since the metric properly reflects how well the model can classify both positive and negatve cases by providing an equal weight to precision as well as recall. Think of problems like spam or fraud detection where the positive cases in the dataset is very low. ROC-AUC does a decent job as well but it can give a higher score to a model which predict only a few positive cases properly.
Choosing the right metric for classification problems
# VC/LC
If your model has high bias, you should:
- Try adding/creating more features
- Try decreasing the regularisation parameter λ
These two things will increase your model complexity and therefore will contribute to solve your underfitting problem.
If your model has high variance, you should:
- Get more data
- Try a smaller set of features
- Try increasing the regularisation parameter λ
# Tips and tricks
- Use cross-validation where possible
- Stratified sampling when classes aren't uniformly distributed