Personalizing Speech Emotion Recognition Systems to Target Speakers

October 10, 2018

Unsupervised personalization of SER systems by adapting deep learning models to target speakers: The unique properties of the externalization of valence from speech

Finding closer source domain speakers at the level of emotional cues to personalize speech emotion recognition systems to target speakers. This approach works best for recognizing valence from speech because of the speaker dependent nature of valence emotional cues.
Achieved 12.01% relative gains in prediction performance on valence measured in terms of concordance correlation coefficient
Personalization approaches used include: fine tuning with more adaptation data, assigning higher weights to closer speakers, adaptation under a few-shot learning framework
This study serves as a proof of concept for robust emotion recognition to target speakers in clinical and defense applications and further minimizes the performance inhibiting effects of concept drift
In Press: Coming soon…