BharatGen AI Set to Embrace All 22 Scheduled Indian Languages by 2026: Government Announcement

BharatGen AI Set to Embrace All 22 Scheduled Indian Languages by 2026: Government Announcement

BharatGen: A National Initiative for AI in Indian Languages

BharatGen, an initiative supported by the government, aims to establish foundational AI models tailored for Indian languages and contexts, and it is set to encompass all 22 scheduled Indian languages by June 2026, as reported to the Lok Sabha on Wednesday.

At present, the models cover nine languages: Hindi, Marathi, Tamil, Malayalam, Bengali, Punjabi, Gujarati, Telugu, and Kannada. The strategic plan aims to increase this coverage to 15 languages, which will include Assamese, Maithili, Nepali, Odia, Sanskrit, and Sindhi, by December 2025, with the complete list finalised by mid-2026.

Functionalities of BharatGen AI Models

The AI models are designed to handle various modalities, including large language models for text processing, text-to-speech conversion, automatic speech recognition, and vision-language systems. As per the government’s statement, BharatGen has successfully developed pilot applications in sectors such as agriculture, governance, and defence, which are intended for nationwide deployment once fully operational.

Organisational Structure of BharatGen

BharatGen operates under the Department of Science and Technology’s National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS). The TIH Foundation for IoT and IoE at IIT Bombay acts as the central hub, overseeing program execution, facilitating national academic collaboration, and fostering ecosystem partnerships for compute, data, and talent. Additionally, the IITM Pravartak Technologies Foundation at IIT Madras plays the role of implementation partner, concentrating on applications related to governance, security, and media.

Key Participants in the BharatGen Consortium

  • IIT Bombay: Lead institution responsible for research and integration
  • IIIT Hyderabad: Specialises in vision-language document modelling
  • IIT Madras: Focused on speech model development and evaluation
  • IIT Kanpur: Conducts legal AI research and strategies for multilingual tokenisation
  • IIT Hyderabad: Works on vocabulary optimisation for multilingual LLMs
  • IIT Mandi: Develops inclusive multilingual models and efficient training methods
  • IIM Indore: Engages in Bharat-centric benchmarking and multilingual data collection

Union Minister Dr Jitendra Singh indicated that while BharatGen is currently in its pilot phase and not available for public use, the objective is to implement it across all states and districts. Additional partnerships with research institutions in Karnataka may also be considered.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.