Slide 9
Slide 9 text
Dataset
• Mendeley
– 80,000
users,
260,000
ar1cles
– 5
million
observa1on
• arXiv
– 120,297
users,
825,707
ar1cles
– 43
million
observa1on
– 10
years(2003-‐2012)
– [test
data]64,978
users,
636,978
ar1cles,
7.6
million
click(2012)
• treat
user
click
as
binary
data
• remove
stop
words
• top
10,000(14,000
for
arXiv)
dis1nct
words
as
the
vocabulary
(use
g-‐idf)