– 5 million observa1on • arXiv – 120,297 users, 825,707 ar1cles – 43 million observa1on – 10 years(2003-‐2012) – [test data]64,978 users, 636,978 ar1cles, 7.6 million click(2012) • treat user click as binary data • remove stop words • top 10,000(14,000 for arXiv) dis1nct words as the vocabulary (use g-‐idf)