Slide 41
Slide 41 text
Copyright © RevComm Inc.
LLMを用いた感情アノテーションの研究紹介
41
参考: J. Santoso, K. Ishizuka and T. Hashimoto, "Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech
Emotion Recognition," ICASSP 2024
● IEMOCAPデータセットによる評価の結果、コンテクストとテキストの音響特徴表現を与えることで、LLMは
人手とほぼ変わらない精度で感情をアノテーションできることが示された (J .Santoso, ICASSP2024)
Acoustic
feature
extractor
Conversion of
acoustic
feature to text
LLM
(single utterance prompt example)
Answer with either one of [neutral, happy, sad, angry].
M speaks “Who did you marry?” with high pitch.
How does M feel? M feels
(conversation prompt example)
Answer with either one of [neutral, happy, sad, angry].
Given the following conversation sequence:
M (high pitch): “So what’s up? What’s new?”
F (low pitch): “Well Vegas was awesome.”
M (normal pitch): “Yeah. I heard.”
F (high pitch): And, um, I got married.”
M (high pitch): “Shut up. No-in Vegas?”
F (high pitch):”Year. In the old town part.”
M (high pitch):”Who did you marry?”
How does M feel? M feels.
(description of acoustic feature example)
Speaking rate: (slow / normal / fast) speaking rate
Articulation rate: (slow / normal / fast) articulation rate
PItch: (low / normal / high) pitch
Loudness: (quiet / normal loudness / loud) speaking rate
Intensity: (low / normal / high) intensity
acoustic feature set
● loudness
● pitch
● speaking rate
● etc
Description of
acoustic feature
(text)
Text content (transcription) Prompt Emotion Class
Input Speech