Intellectual Property India Publishes Patent Application for 'A Unified Multimodal Deep Learning System For Detecting Manipulated Audio, Image, And Video Content' Filed by Seshadri Rao Gudlavalleru Engineering College; Balakrishna Tilakachuri; Mohammed Adil; Motukuri Shrivalli; Matta Kanaka Sri; and Parasa Naga Veera Vardhan

Posted On: 2026-05-01 Patentwipo

MUMBAI, India, May 1 -- Intellectual Property India has published a patent application (202641049582 A) filed by Seshadri Rao Gudlavalleru Engineering College; Balakrishna Tilakachuri; Mohammed Adil; Motukuri Shrivalli; Matta Kanaka Sri; and Parasa Naga Veera Vardhan, Gudlavalleru, Andhra Pradesh, on April 18, for 'a unified multimodal deep learning system for detecting manipulated audio, image, and video content.'

Inventor(s) include Seshadri Rao Gudlavalleru Engineering College; Balakrishna Tilakachuri; Mohammed Adil; Motukuri Shrivalli; Matta Kanaka Sri; and Parasa Naga Veera Vardhan.

The application for the patent was published on May 1, under issue no. 18/2026.

According to the abstract released by the Intellectual Property India: "The rapid advancement of artificial intelligence has led to the emergence of deepfake technologies capable of generating highly realistic manipulated media across images, videos, and audio. These developments pose significant challenges in digital security, misinformation control, and content authentication. The present invention proposes a unified multimodal deep learning system designed to detect manipulated multimedia content across multiple modalities within a single framework. The system processes input media through modality-specific pipelines, including image, video, and audio analysis modules. Image-based detection is performed using convolutional neural networks that extract spatial features to identify visual inconsistencies. Video-based detection incorporates both spatial and temporal analysis using a hybrid architecture combining convolutional neural networks and sequence modeling techniques. Audio-based detection utilizes feature extraction methods such as Mel-Frequency Cepstral Coefficients, followed by classification using machine learning algorithms to identify synthetic or manipulated speech patterns. A decision fusion mechanism integrates the outputs from each modality-specific model to generate a final classification result indicating whether the media is real or fake. The system also computes confidence scores to indicate the reliability of predictions. The modular design enables scalability and independent updates to each detection component, allowing adaptability to evolving deepfake generation techniques. The proposed system provides an efficient, scalable, and accurate solution for detecting manipulated multimedia content and is applicable in digital forensics, media verification, cybersecurity, and misinformation prevention."

Disclaimer: Curated by HT Syndication.

Category