MUMBAI, India, Nov. 21 -- Intellectual Property India has published a patent application (202541017131 A) filed by Nandini S; and Dr. Mohan K G, Vijaynagar, Karnataka, on Feb. 27, for 'enhancing real-time speech emotion recognition with deep learning and data augmentation.'

Inventor(s) include Nandini S; and Dr. Mohan K G.

The application for the patent was published on Nov. 21, under issue no. 47/2025.

According to the abstract released by the Intellectual Property India: "In human interactions, emotions are easily perceived through facial expressions, body gestures, and speech. However, detecting emotions in human-machine interactions is challenging. To enhance this interaction, Speech Emotion Recognition (SER) aims to identify emotions solely through vocal intonation. This work proposes a deep learning-based SER system incorporating two data augmentation techniques: noise addition and spectrogram shifting. We evaluate our approach using three benchmark datasets-TESS, EmoDB, and RAVDESS. To extract relevant vocal features, we employ Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), Mel spectrograms, Root Mean Square (RMS), and chroma features. For classification, we experiment with three deep learning models: Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), and a hybrid CNN-Bidirectional Long Short-Term Memory (Bi-LSTM) model. Our findings indicate that the CNN-BiLSTM model, combined with data augmentation, achieves the highest accuracy for real-time speech emotion recognition. This study demonstrates the effectiveness of deep learning in SER and highlights the potential of hybrid models for improving emotion detection in human-machine interactions."

Disclaimer: Curated by HT Syndication.