MODE DECOMPOSITION AGENDA ▸ 1)Utilization of Virtual HumanAgent by Classification of Emotions (Why?) ▸ Whatis“Humanness” inthefirst place? ▸ 2)Classification of emotions of speech by convolution of Graph (How to?) ▸ PossibilityofGraphasUnifiedfeature quantity ▸ 3)Classification of emotions of speech by Dynamic Mode Decomposition (How to?) ▸ Classificationofspeechemotionsby Dynamic Mode Decomposition on Deep Learning
MODE DECOMPOSITION WHAT IS BROUGHT TO RACHEL IN SENTIMENT ANALYSIS? ▸ Add emotional identification to Virtual Human Agent FOR EXAMPLE, THE VIRTUAL HUMAN AGENT RETURNS "IT WAS GOOD! (EMPATHY)" FOR “I GOT A TICKET TODAY (HAPPY!)"
MODE DECOMPOSITION WHAT IS THE “HUMANNESS”? ▸ “Outer receptacle” and “Inner receptacle” ▸ “Inner receptacle” is a physical internal reaction including internal organs, such as motivation associated with tension. ▸ As an organ, it refers to an autonomous system that “can not be controlled by ▸ It is a remarkable system as an organ. It is “necessary to be conscious” of this oneself, and the focal point of view is an example. ourself” (heart rate, blood pressure, etc.) HUMANNESS, PERSONALITY Beautiful! ▸ “Outer reception” is closely related to psychological reactions represented by cerebral reaction, Circumplex Model. OUTER RECEPTACLE HUMANNESS, PERSONALITY EXPERIENCE + PREFERENCE INNER RECEPTACLE OUTER RECEPTACLE
MODE DECOMPOSITION WHAT IS THE “HUMANNESS”? ▸ “Active” system and “Passive” system ▸ From the viewpoint of external acceptance and internal acceptance, there are active systems “autonomously reacting” to cerebral function and passive systems waiting for stimulation. ▸ Tendency of cerebral site reaction ▸ Outer receptions tend to be internal to the cerebrum, its typical organ is the “insular cortex”. ▸ In vision, organs that capture moving bodies/Inner receptions tend to be inside the cerebrum outside, outside acceptance to make “higher-order cognition accompanied with sensibility more in detail” tends to be inside the cerebrum. Inner receptions SOMETHING LIKE GREEN Outer receptions ELEPHANT!
MODE DECOMPOSITION WHAT IS THE “HUMANNESS”? ▸ Difference in reaction rate of cerebral region ▸ “Inner reception” is twice as fast as “Outer acceptance” reaction. From the comparison of this speed, it is thought that recognition accompanied by sensibility of outer receptions is performing higher order behavior. ▸ Hypothesis of definition of sensitivity ▸ For these intrinsic and extrinsic receptors, we assume that a “large difference response is a sensory response”. For example, when internal reaction does not show reaction, when large response is observed by external acceptance, reaction “aware” > OUTER RECEPTION … … BEAUTIFUL! EXPERIENCE PREFERENCE INNER RECEPTION Fast Slow HOT! NOTHING TO THINK ABOUT INNER RECEPTION OUTER RECEPTION NOT HUMANNESS INNER RECEPTION OUTER RECEPTION HUMANNESS difference THIS IS!
MODE DECOMPOSITION WHAT IS THE “HUMANNESS”? ▸ “Sensibility” from the viewpoint of modeling ▸ High-level relationship of sensitivity ▸ From the low level, take the phase of “reaction”, “emotion”, “sensitivity” and phase, and sensitivity as high-order brain function. ▸ ▸ This shows that there is a range of reactions of individual differences that depend on “personality” as they reach higher levels at the same time. At the same time, it shows dependence on higher order low order, indicating the range of responsiveness of "sensibility” due to circumstances such as exercise and low temperature, environmental factors. Reaction Emotion Sensibility Humanness, Personality
MODE DECOMPOSITION WHAT IS THE “HUMANNESS”? ▸ “Sensibility” from the viewpoint of modeling ▸ Contrast with Circular model ▸ An annular model expresses emotion with large pleasure axis /active axis ▸ ▸ In this case, we will give a subjective axis (psychological axis) and expand the range of explanation of the emotional high sense of sensitivity With this axis, we explain the change of reaction "excitement" of outer acceptance from reaction of inner acceptance “beat” EMOTION SENSIBILITY Subjective axis(psychological axis) BEAT EXCITEMENT Individual difference EMOTION
MODE DECOMPOSITION WHAT IS THE “HUMANNESS”? ▸ Classification of emotional analysis of “individual's utterance” ▸ Good datasets including individual differences, the accuracy of their identification ▸ If you have a discrimination layer of individual differences and you answer correctly to multiple of “1) individual differences" and “2) classifications”, it is assumed that a reaction with human likeness has been made and its accuracy is evaluated in NLP. Subjective axis(psychological axis) EMOTION CLASSIFIER EMOTION INDIVIDUAL CLASSIFIER AND B’S UTTERANCE A B Active/Pleasure axis ACCURACY (F1-VALUE) ACCURACY (F1-VALUE) TOTAL ACCURACY (F1-VALUE) GOOD BAD OR INDIVIDUAL'S
MODE DECOMPOSITION WHICH DATA SET DO YOU USE? ▸ RAVDESS Dataset (Ref. https://smartlaboratory.org/ravdess/) Songs / Normal 2 Classes 8 Persons 24 Utterance 2 Strength 2 Repeat 2 (*)…Classes: neutral, calm, happy, sad, angry, fearful, disgust, surprised (*)…Persons: 12 men and 12 women (*)…Strength: Normal and Strong ▸ Spectrogram and changes in time series for each category SPEECH CONSISTING OF COMPLICATED WAVEFORMS
MODE DECOMPOSITION WHICH FEATURE DO YOU USE? ▸ Mel-Frequency Cepstrum Coefficients LOW FREQUENCY EXPANSION WITH VOICE FEATURES ▸ Tonality analysis (HPCP /Harmonic Pitch Class Profile) PITCH (MUSICAL SCALE, A / B / C, ETC.) AS BAND, AND TIME SERIES BASED FEATURE QUANTITY
MODE DECOMPOSITION CLASSIFICATION OF EMOTIONS OF SPEECH BY CONVOLUTION OF GRAPH (HOW TO?) ▸ Graph Convolution (Abstract) ▸ Convolution using graph structure ▸ The graph can be used for expressing the luminance of pixels (image), the value of each element of the spectrogram (sound), and the connection of words (NLP), so it is expected to acquire “unified feature values” by graph GRAPH CONVOLUTION HOWEVER, THE CONNECTION OF THE GRAPH IS UNSPECIFIED AND CAN NOT BE REPRESENTED AS IT IS
MODE DECOMPOSITION POSSIBILITY OF GRAPH AS UNIFIED FEATURE QUANTITY ▸ ▸ Consideration ▸ Since the filter convolution filter H represents the smoothness of the frequency, this corresponds to the resolution. Therefore, high resolution is obtained, that is, higher accuracy can be obtained by obtaining more edge direction. H_0 H_1 H_n HIGHER ACCURACY At the same time there is a limit to the decomposition of time series and band.
MODE DECOMPOSITION CLASSIFICATION OF EMOTIONS OF SPEECH BY DYNAMIC MODE DECOMPOSITION (HOW TO?) DMD ▸ Dynamic mode composition (Abstract) ▸ Mode decomposition focusing on time series variation ▸ The sound appears especially in time series fluctuations. Therefore, it is insufficient to convolve the phase and the band at the same time in the graph, and “pay attention to the time series”. PREDICTION AND ACQUISITION OF LONG CYCLE VARIATION
MODE DECOMPOSITION CLASSIFICATION OF EMOTIONS OF SPEECH BY DYNAMIC MODE DECOMPOSITION (HOW TO?) DEFINITION OF TIME EVOLUTION SINGULAR VALUE RESOLUTION EIGENVALUE RESOLVING ▸ Dynamic mode composition (Details) Time series variation Low rank approximation Eigen mode calculation ∫ ∫ DIFFERENCE ON FEATURE ENGINEERING MAPPING IN LINEAR SPACE
MODE DECOMPOSITION CLASSIFICATION OF EMOTIONS OF SPEECH BY DYNAMIC MODE DECOMPOSITION (HOW TO?) TIME EVOLUTION FROM DMD MODE TENSOR EXPANSION OF SVD AND EIGH FROM THE OBTAINED MODE TO CLASSIFICATION ▸ Dynamic mode composition (Details) Long cycle variation Tucker decomposition Emotional classification ON FEATURE ENGINEERING ON DEEP LEARNING MAPPING AND DECOMPOSING ALL DATA ON ONE AXIS
MODE DECOMPOSITION CLASSIFICATION OF EMOTIONS OF SPEECH BY DYNAMIC MODE DECOMPOSITION (HOW TO?) ▸ Consideration ▸ Emotional sounds appear strongly in pitch and have large features in time series. For that reason, Validation Accuracy improves with DMD and convolution focused on them. FEATURES IN PITCH AND TIME SERIES ACQUISITION OF CLASSIFICATION OF HAPPY
MODE DECOMPOSITION SUMMARY ▸ Graph Convolution ▸ High accuracy in images (98% over) ▸ Accuracy about sound (70%) ▸ However, when targeting emotions, convolution is less effective due to phase and bandwidth (10%-30%) ▸ Dynamic mode decomposition Convolution ▸ Acquiring emotional features from spectrograms is less accurate (40%) ▸ However, from the relationship between pitch and emotion, high generalization performance can be obtained by paying attention to their time series (around 90%)