Slide 4
Slide 4 text
֤ݴޠϞσϧͷ֓ཁ
[4a-3] ٠ాངฏ 4/6
arg min
✓,✓D
[↵CR(✓) + (1 ↵)CD(✓D)] CD
/
X
(u,i)
log
p
⇣
du,i = (
w(1)
u,i
, · · · , w(nu,i)
u,i )
| i
⌘
HFT
BoWLF
p
⇣
du,i = (w(1)
u,i
, · · · , w(nu,i)
u,i
)| i
⌘
p
⇣
w(t)
u,i
= j| i
⌘
y
LMLF
We jointly optimize the rating prediction model in Eq. (1)
the review model in Eq. (3) by minimizing the convex
bination of CR in Eq. (2) and CD in Eq. (4):
arg min
✓,✓D
↵ CR(✓) + (1 ↵)CD(✓D
, { i
}M
i
=1
), (6)
re the coe cient ↵ is a hyperparmeter.
.1 BoWLF: Distributed Bag-of-Word
he first model we propose to use is a distributed bag-of-
ds prediction. In this case, we represent each review as
ag of words, meaning
du,i = ⇣w(1)
u,i
, · · · , w(
nu,i)
u,i
⌘ ⇡ nw(1)
u,i
, · · · , w(
nu,i)
u,i
o . (7)
s leads to
p(du,i | i
) =
nu,i
Y
t
=1
p(w(
t
)
u,i
| i
).
We model p(w(
t
)
u,i
| i
) as an a ne transformation of the
duct representation i
followed by, so-called softmax,
malization:
p(w(
t
)
u,i
= j | i
) =
exp {yj}
P|V |
l
=1
exp {yl}
, (8)
re
y = W i
+ b
h(
t
) = h(
t
1), w(
t
1)
u,i
,
i
There are a number of choices available
the recurrent function . Here, we use
memory (LSTM, [9]) which has recently
cessfully to natural language-related tasks
In the case of the LSTM, the recurrent f
in addition to its hidden state h(
t
), the me
that
hh(
t
); c(
t
)
i = ⇣h(
t
1), c(
t
1), w(
t
u
where
h(
t
) = o(
t
) tanh(c(
t
))
c(
t
) = f(
t
) c(
t
1) + i(
t
) ˜
c
The output o, forget f and input i gates a
2
4
o(
t
)
f(
t
)
i(
t
)
3
5 = (Vg
E hw(
t
1)
u,i
i + Wg
h(
t
Ug
c(
t
1) +
and the new memory content ˜
c(
t
) by
˜
c(
t
) = tanh(Vc
E hw(
t
1)
u,i
i + Wc
h(
t
1)+
Uc
c(
t
1) + A
matrix
h shares
n model.
ling the
espond-
(3)
del.
(✓D and
(4)
does not make any assumption on how each review is rep-
resented, but takes a sequence of words as it is, preserving
the order of the words.
In this case, we model the probability over a review which
is a variable-length sequence of words by rewriting the prob-
ability as
p(du,i = (w(1)
u,i
, · · · , w(
nu,i)
u,i
) | i
)
=p ⇣w(1)
u,i
| i
⌘
nu,i
Y
t
=2
p ⇣w(
t
)
u,i
| w(1)
u,i
, · · · , w(
t
1)
u,i
, i
⌘ ,
We approximate each conditional distribution with
p ⇣w(
t
)
u,i
= j | w(