Why is valence emotion so hard to predict from speech?