India’s AI Future: The Crucial Role of Data Integrity Over Model Development, Says MoSPI Secretary Saurabh Garg

India’s AI Future: The Crucial Role of Data Integrity Over Model Development, Says MoSPI Secretary Saurabh Garg



Artificial Intelligence Readiness in India: The Importance of Quality Data


Artificial intelligence is transitioning from experimental stages to real-world applications globally. The AI readiness of India will be influenced more by the quality of its data than by the sophistication of its models, as highlighted by Dr Saurabh Garg, Secretary of the Ministry of Statistics and Programme Implementation (MoSPI), on January 22.

AI Systems and Data Quality

During his address at Nasscom’s Responsible Intelligence Confluence (RICON) 2026, Garg discussed how both governments and businesses are adopting AI systems for various applications, including resource allocation, beneficiary identification, demand forecasting, and outcome evaluation. However, the effectiveness of these systems hinges on the reliability, consistency, and machine-readability of the underlying data.

The Core Issue: Data and Metadata

Garg pointed out that AI readiness does not primarily revolve around model issues but rather hinges on data and metadata quality. He stated that large language models (LLMs) and sophisticated analytical tools cannot yield precise results if trained on poorly structured or weak data. He added that these models and systems struggle when data is presented in inconsistent formats, contains low-quality signals, lacks semantic clarity, or is locked in formats like PDFs, which machines cannot easily process.

Consequences of Poor Data Quality

The implications of inadequate data quality can result in significant policy failures. Garg noted that a well-designed model can inadvertently exclude eligible households from receiving benefits due to mismatched identifiers in datasets. This is not merely a theoretical failure but reflects issues with the data and overall perspective.

Open Data Is Not Enough

According to Garg, simply releasing datasets is insufficient in an AI-centric environment. He expressed that open data alone cannot ensure effectiveness. Various elements, including APIs, standards, quality signals, and identified challenges, are essential; otherwise, open datasets may remain largely invisible or misleading for AI applications.

The Role of Metadata

Garg emphasised that metadata—information detailing the origin of data, how it was created, and its meaning—has become critical to digital infrastructure. With readily available provenance, lineage, and semantic definitions, algorithms can generate explicable outputs, enabling better decisions based on such data.

Government Initiatives for AI-Ready Data

The ministry is proactively enhancing the AI-readiness of governmental data by introducing a national metadata structure, issuing guidelines for API design, launching discovery platforms and microdata portals, and implementing data harmonisation protocols across various departments.

Challenges with Sensitive Data

Nevertheless, Garg acknowledged that not all data can be made fully open, particularly due to the sensitive nature of certain administrative records and microdata, which carry security implications. He proposed that the solution lies in establishing trustworthy infrastructure, controlled research environments, privacy-preserving techniques, and federated methods that allow analyses without revealing personal information.


Exit mobile version