# Dope Tools
- Google Refine - for data cleaning/transformation
- Wrangler - for data cleaning/transformation
- D-Dupe - for entity resolution by finding potential duplicates by comparing attributes and similar neighbors
# Terms
- Entity Resolution: Think of consolidating same products sold by different sellers housed under the same umbrella/product page. Similarity comparisons can be very useful in this.
- Co-occurence Grouping: Also known as frequent itemset mining, association rule discovery, market basket analysis
- Finding associations between entities based on transactions that involve them
# Similiarity functions
- Euclidean Distance
- Manhattan Distance
- Jaccard Similarity
- Overlap of nodes neighbors.
- Jaccard similarity of sets S and T is
- Value of 1 means complete overlap, 0 means no overlap
- String edit distance
- Measures how many textual transformations you need to do to transform one string to another
# Visualization Techniques
# Pre-attentively processed features
# Gestalt Psychology
Has 8 good aspects on which we perceive real world groupings
- Proximity
- Similarity
- Closure
- Symmetry
- Common Fate
- Continuity
- Good Gestalt
- Past Experience
# Color schemes
A useful site is colorbrewer
D3 →