MUMBAI, India, Jan. 2 -- Intellectual Property India has published a patent application (202541125388 A) filed by Anurag University, Hyderabad, Telangana, on Dec. 11, 2025, for 'an image-to-audio captioning system for generating multilingual spoken descriptions from a raw image input.'

Inventor(s) include Dr. V. Vijaya Kumar ; Dr. G. Vishnu Murthy; Mr. G. Kiran Kumar; Dr. Deva Rajasekhar; Mr. T. Srikanth; Dr. V. Krishnaiah; Dr. G. Balram; and Mr. Madar Bandu.

The application for the patent was published on Jan. 2, under issue no. 01/2026.

According to the abstract released by the Intellectual Property India: "The present invention relates to an image-to-audio captioning system (100) configured to automatically generate multilingual and culturally relevant spoken descriptions from raw image inputs. The system comprises an input layer (102) for receiving an image, a convolutional neural network (CNN) feature-extraction block (104) for deriving visual feature vectors, and a caption generator employing a Long Short-Term Memory (LSTM) network (106) for producing a textual description corresponding to the visual content. The generated caption is processed by a translation layer (108) that converts the description into a user-selected target language with semantic and cultural adaptation. A text-to-speech (TTS) component (110) subsequently transforms the translated caption into an audio output. The system integrates computer vision, natural language processing, neural machine translation, and speech synthesis into a unified pipeline, enabling real-time generation of accessible, multilingual spoken captions useful for visually impaired users, educational applications, cross-cultural content consumption, and assistive technologies."

Disclaimer: Curated by HT Syndication.