Slide 15
Slide 15 text
PERFORMANCE
k = 5 k = 9
Standard 73.4% 72.9%
Normalized
Normalized + cosine distance
* Cross validation for k = 3, 177 training records, 354 testing records
96% 96.6%
95.7% 95.7%
If you know the domain and have a bit of mathematical knowledge you can easily
tweak those simple algorithms to perform better in your domain. In this example we
spotted a lot of variance in the attribute data, so we normalized by making sure the
standard deviation for each column is 1 (http://en.wikipedia.org/wiki/
Standard_score).