@jaytaph 8 Intelligence Statistics Actual June 1940 1000 169 June 1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122
@jaytaph 8 Intelligence Statistics Actual June 1940 1000 169 June 1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271
@jaytaph ➡ Avoid (semi) sequential data to be leaked. ➡ Adding randomness and offsets will NOT solve the issue. ➡ Use UUIDs (better: timebased short IDs, you don't need UUIDs) 13
@jaytaph 57 P(A|B) P(A) P(B) P(B|A) Probability event A, if event B (conditional) Probability event A Probability event B Probability event B, if event A
@jaytaph ➡ 10 out of 50 comments are "negative". ➡ 25 out of 50 comments uses the word "horrible". ➡ 8 comments with the word "horrible" are marked as "negative". 59
@jaytaph 63 ➡ You might want to filter stop-words first. ➡ You might want to make sure negatives are handled property "not great" => negative. ➡ Bonus points if you can spot sarcasm.