Viveckh's Notepad

Notes from ML for Trading

# What's Involved

Features/Factors: X1, X2, X3
Labels: Y
Nodes
- feature used
- split value
- left link
- right link
Root
Leaves

# The Data Structure

Using an 2-dimensional array is actually quite efficient, instead of building a object-oriented tree

Decision Tree: Graphical View Decision Tree: Tabular View

The value in the left/right column points to the index of array where the next node can be found depending on whether we are going left or right.
Factor of -1 indicates a leaf node.
Instead of using absolute indexes in left/right column, we can also use a relative value to jump to the next relevant row. i.e. in row 1, instead of entering absolute index of 5 in left column, you can write 4, since it requires +4 jumps to get from row 1 to row 5.

# Constructing the Decision Tree

We're using the relative indexes instead of absolute indexes in this algorithm for the left/right columns

# Determing Best Feature

While constructing the decision tree, we continuously split the data and work with smaller subset at every subtree as we go down. As the number of records in dataset get narrower and narrower, we might start discovering different features that best describe their relationship to the label.

When you get to the lowest subtree and have only 3-4 records, it is likely you will get the same value of correlation/entropy for all features. In such case, you can pick determininstically by order or randomly. Recommended: Randomly

# Approaches

Information gain: Entropy (Measure the randomness and diversity of data)
Information gain: Correlation (with the label)
Information gain: Gini index

# Random Tree Algorithm

The Decision Tree algorithm is great but the bottleneck is the part of computing the best feature to split on, and also computing the mean. That's where the random tree algorithm makes changes.

While a single random tree in itself is weak compared to a decision tree, but if we adopt bagging and construct a bunch of random trees, they lead to better results.

Random Tree Algorithm

# Strengths & Weaknesses

Cost of Learning is higher compared to instance-based algorithms like KNNs or parametric algorithms like Linear Regressions.
Cost of query is faster compared to something like KNNs but slower than parametric algorithms like Linear Regressions.
No need to normalize the data like in other algorithms like KNN so all features with different value ranges are all treated as of equal importance.

# Evaluating Trees

The more the number of nodes, the more perfectly the model fits the training data. As a result, overfitting might occur causing the accuracy in train data to be high but test data to be low. So, we should look to generalize.
The closer the correlation between Y_predict & Y_test is to 1, the better.

Notes from ML Course

# Expressiveness of Decision Trees

Analyzing decision trees using the simple logic gates

# OR

Solving OR problems creates a decision tree that grows linearly

# XOR

Solving XOR problems creates a decision tree that grows exponentially
Odd parity refers to answer will be true if the number of true variables is odd.

DT Expressiveness

# Bias

# Restriction Bias

Bias because of hypothesis set we care about. For decision trees, that would be all possible decision trees. This means we're not considering linear or quadratic functions, non-boolean functions. We only care about what can be represented by decision trees.

# Preference Bias

Heart of inductive bias

What do we prefer out of the all possible hypothesis we care about. For ID3 decision trees, since we make decision top down, we prefer trees that

make good splits near the top
correct modeling
shorter trees

# Considerations

It isn't right to repeat an attribute down the same path of the tree if it is a binary/discrete attribute since you would've already gotten its answer earlier in the path. But it is perfectly acceptable to repeat an attribute down the same path if it is continuous attribute. For eg. we might group people under 20 earlier in the tree, and then later attempt to identify if they're teenagers by further restricting their age further down in the tree.

# Avoiding Overfitting

Cross validation by constructing multiple trees and picking one with least error on testing set
While building a tree, everytime before expanding the tree, check against the validation set and see if the testing error is low enough, if it is, you can stop.
Suggested: Pruning: First build the tree without worrying about anything. Once it is done, go through attempting to remove nodes and check how removing the node might affect the overall accuracy of the tree. Only remove a node if it has a minimal effect on the accuracy.

← Introduction Neural Networks →