Slide 27
Slide 27 text
ऀμΠΞϥΠθʔγϣϯͷԠ༻
• /FVSBM%JBSJ[BUJPO XJUI/POBVUPSFHSFTTJWF*OUFSNFEJBUF"UUSBDUPST
<'VKJUB
*$"441>
𝑋
TransEnc!
𝐴
𝐸"
Sigmoid(𝐴#𝐸"
)
𝑌
TransEnc"
…
Audio sequence
Attractors
Embeddings
Speaker labels
LSTM$%&
LSTM'$&
Autoregressive
𝑋
TransEnc(
Audio sequence
BEFORE
Autoregressive attractor
𝐸(
𝑊𝐴(
𝐸(
TransEnc()!
+
𝐴(
= Attn(𝑄, 𝐸(
, 𝐸(
) Sigmoid(𝐴(
#𝐸(
)
𝑌(
AFTER
Non-autoregressive intermediate attractors
intermediate prediction
conditioning
Speaker-wise