MUMBAI, India, June 22 -- Intellectual Property India has published a patent application (202641058558 A) filed by Amrita Vishwa Vidyapeetham on May 07, 2026, for A Multi-Modal Facial Video Manipulation Detection System With Cross-Modal Attention.

Inventors include Dutta, Rachaita; Mukhopadhyay, Adwitiya; Das, Soumik; Jayalal, Gourav; and Rejesh, Sourav.

The application for the patent was published on June 12, 2026, under issue no. 24/2026.

Abstract: The present disclosure provides a system (100) for detecting manipulated facial video content. The system includes a frame sampler (104) sampling T frames from an input video (102), a patch-based visual encoder (106) extracting a visual feature vector per frame, and a landmark graph encoder (108) yielding a geometric feature vector per frame. A cross-modal attention unit (110) produces an attended representation where queries derive from the visual feature vector and keys and values derive from the geometric feature vector. A learnable gate (112) progressively incorporates geometric feature contributions during training while preserving pre-learned visual feature integrity. A temporal encoder (114) and fusion unit (116) model temporal inconsistencies, and a classifier (118) outputs a binary decision signal identifying the input video (102) as real or manipulated. Unlike conventional single-modal detection approaches, the present disclosure captures complementary structural and visual cues for robust manipulation detection.

Disclaimer: Curated by HT Syndication.