As the speech is not only a medium to communicate and it carries a lot of information along with the words and their meanings. Along with the words, speeches carry emotions and can also tell about the mental state of the user a lot. A human can easily recognize these features from any speech just by hearing it but it will not be easy to do the same for a software so precisely. To deploy a model that can precisely analyze and authenticate the speech in order to predict the surroundings and current mental state of the user so that necessary actions can be taken accordingly.
That's why Emotion Detection has become one of the biggest marketing strategies in which mood of the customer plays an important role. So to detect the current emotion of the person and suggest him the appropriate product or help him accordingly will increase the demand of the production the company.
In order to predict the emotions form the speech it is performing the following steps:
1. Taking Audio Input from user.
2. Analyzing the audio signals.
3. Masking and cleaning of audio.
4. Extraction of features.
5. Loading our pretrained model.
6. Making prediction of emotion.