MUMBAI, India, June 30 -- Intellectual Property India has published a patent application (202621054766 A) filed by Indian Institute Of Technology Gandhinagar on April 29, 2026, for System And Method For Generating A Coreset From A Large Unlabelled Dataset.

Inventors include Singh, Prajwal Kumar; Vashishtha, Gautam; Mastan, Indra; and Raman, Shanmuganathan.

The application for the patent was published on June 26, 2026, under issue no. 26/2026.

Abstract: ABSTRACT SYSTEM AND METHOD FOR GENERATING A CORESET FROM A LARGE UNLABELLED DATASET The present disclosure describes a system (100) and method (200) for generating a representative coreset (122) from a large unlabelled dataset to enable efficient self-supervised learning. The system (100) comprises a feature processing module (104) configured to extract feature representations from a domain-specific dataset and an open-set dataset. A filter construction module (106) converts features of the domain-specific dataset into binary encodings and stores them in a counting Bloom filter (108) for probabilistic membership testing. A candidate selection module (112) evaluates features from the open-set dataset against the counting Bloom filter (108) to identify candidate samples. A refinement module (118) performs similarity-based filtering to select relevant samples, forming the coreset (122). An output module (124) provides the coreset (122) for training a self-supervised learning model, thereby reducing computational cost and sampling time while maintaining downstream performance.

Disclaimer: Curated by HT Syndication.