MUMBAI, India, June 22 -- Intellectual Property India has published a patent application (202641070361 A) filed by Dharsana Dharani Vg; and Abinaya A on June 05, 2026, for Real-Time Multimodal Deep Fake Detection Using Visual-Audio Alignment And Gaze-Texture Fusion.
Inventors include E. Uma; Dharanika S; Dharsana Dharani Vg; Abinaya A; and Vishal R.
The application for the patent was published on June 12, 2026, under issue no. 24/2026.
Abstract: The rapid growth of deepfake technology has created serious concerns regarding the authenticity and reliability of digital media, as manipulated videos are becoming increasingly realistic and difficult to detect using traditional methods. To address this challenge, a multimodal deepfake detection framework is proposed that combines visual, audio, and synchronization features for improved detection accuracy. The system uses a Vision Transformer (ViT) to extract detailed spatial features from facial video frames, while WavLM analyzes the temporal and spectral characteristics of speech signals. In addition, SyncNet is employed to verify lip-sync consistency between facial movements and corresponding audio. The framework follows a systematic pipeline that includes data preprocessing, feature extraction, multimodal fusion, and final classification. Video frames and audio signals are processed independently before being integrated into a unified feature representation. The extracted features from ViT, WavLM, and SyncNet are fused into a single feature vector and passed into a supervised learning model to classify videos as real or fake.
Disclaimer: Curated by HT Syndication.