MUMBAI, India, April 17 -- Intellectual Property India has published a patent application (202641042550 A) filed by Srm University- Ap, Guntur, Andhra Pradesh, on April 2, for 'hybrid compression and jit dequantization framework for transformer inference on legacy edge architectures.'
Inventor(s) include Kolusu, Naga Sai Manikanta; Rao, Akurathi Venkata Kaleswara; Kumar, Ghantasala Jaswanth; Nakka, Shekar; and Elumalai, Karthikeyan.
The application for the patent was published on April 17, under issue no. 16/2026.
According to the abstract released by the Intellectual Property India: "The present invention relates to a system for real-time transformer inference on legacy edge architectures, comprising an edge computing device comprising a legacy graphics processing unit (GPU) having a Maxwell architecture, a memory configured to store a compressed transformer model in a Normal Float 4 (NF4) format, a Just-In-Time (JIT) adaptation layer, a knowledge distillation module, and an inference engine. The compressed transformer model is generated through a hybrid compression process combining knowledge distillation and post-training quantization. The JIT adaptation layer performs runtime reconstruction of model weights from the NF4 format to a Float16 (FP16) format, comprising a dequantization module that reconstructs approximate weights using a weight tensor stored as a 4-bit value, a first scaling factor stored as an 8-bit value, and a secondary scaling factor stored as an FP32 value."
Disclaimer: Curated by HT Syndication.