Personalizing Speech Emotion Recognition Systems to Target Speakers
Unsupervised personalization of SER systems by adapting deep learning models to target speakers: The unique properties of the externalization of valence from speech
-
Finding closer source domain speakers at the level of emotional cues to personalize speech emotion recognition systems to target speakers. This approach works best for recognizing valence from speech because of the speaker dependent nature of valence emotional cues.
-
Achieved 12.01% relative gains in prediction performance on valence measured in terms of concordance correlation coefficient
-
Personalization approaches used include: fine tuning with more adaptation data, assigning higher weights to closer speakers, adaptation under a few-shot learning framework
-
This study serves as a proof of concept for robust emotion recognition to target speakers in clinical and defense applications and further minimizes the performance inhibiting effects of concept drift
-
In Press: Coming soon…