At a Hyderabad AI meet, experts urged the government to prioritise open, India-specific datasets over GPU spending, arguing that true openness and trust in AI depend on publicly accessible training data.
India’s AI challenge is not compute scarcity but the absence of truly open, high-quality training datasets, experts said at a discussion organised by the Software Freedom Law Centre at IIIT Hyderabad, calling on the India AI Mission to prioritise public data collection over GPUs.
“My primary focus for the India AI Mission would be to put all the money, first of all, into collecting data,” one speaker said.
The Mission has allocated Rs 10,372 crore over five years, with this year’s outlay cut to Rs 1,000 crore from Rs 2,000 crore, and only Rs 800 crore utilised last year. Speakers argued the structure is ineffective. “It’s not lakhs of crores… It’s not even a drop in the ocean, compared to megacorps in the US and China,” one said, adding that long procurement cycles leave hardware outdated: “By the time that process was completed, the GPUs had already changed.”
Instead, India faces a severe shortage of language data. “Everybody is starved of data. And the way out they’re taking is using synthetic data,” a speaker noted, warning this leads to “bad data” and “a bad model.” Even Hindi lacks adequate, dialect-sensitive coverage.
Experts proposed a government-led effort to crowdsource datasets through teachers and public institutions. “Eight thousand samples of questions, answers, and reasoning—can’t we get 8,000 samples… in Telugu or Hindi? Is that so hard? It’s not.”
They also criticised “openwashing”. “Nobody provides you the training data,” one said, stressing that true open source must include data, documentation, and workflows. Frameworks such as the Model Openness Framework were recommended, while Stable Diffusion was cited as an example of fully open AI.
The message was clear: open data, not GPUs, is the foundation of AI sovereignty and accountability.














































































