10 Intelligence Statistics Actual June 1940 1000 169 June 1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271 342
11 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡ Used to approximate the number of iPhones sold in 2008. ➡ Calculate approximations of datasets with (incomplete) information.
➡ Avoid (semi) sequential data to be leaked. ➡ Adding randomness and offsets will NOT solve the issue. ➡ Use UUIDs (better: timebased short IDs, you don't need UUIDs) 13
➡ 10 out of 50 comments are "negative". ➡ 25 out of 50 comments uses the word "horrible". ➡ 8 comments with the word "horrible" are marked as "negative". 59
66 "Your product is horrible and does not work properly. Also, you suck." "I had a horrible experience with another product. But yours really worked well. Thank you!" Negative: Positive:
67 ➡ You might want to filter stop-words first. ➡ You might want to make sure negatives are handled property "not great" => negative. ➡ Bonus points if you can spot sarcasm.
72 Find me on twitter: @jaytaph Find me for development and training: www.noxlogic.nl / www.techademy.nl Find me on email: [email protected] Find me for blogs: www.adayinthelifeof.nl