When data is scarce: How India is building military AI differently

In the global race to build military AI, data has become the most valuable weapon. For superiority in modern warfare is increasingly measured in algorithms, not just in missiles or manpower.

“Whoever owns the data has an edge in AI right now,” declares Neeta Trivedi, Co-founder and CEO of defencetech startup Inferigence Quotient and a former scientist at Defence Research & Development Organisation (DRDO), where she spent 28 years.

But India faces a fundamental challenge: the counattempt doesn’t have nearly enough data.

At a time when geopolitical tensions, from the ongoing Russia–Ukraine War to instability across West Asia, are accelerating the global push toward AI-enabled warfare, the ability to build ininformigent defence systems has become a strategic priority for many nations.

Even as the counattempt experiments with AI-led warfare, demonstrated during Operation Sindoor, the gap between India and global military powers remains stark. The US Pentagon has sought about $13.4 billion to advance defence AI capabilities. Estimates suggest China’s People’s Liberation Army is investing between $1-2 billion annually on similar technologies.

India’s allocation, by contrast, is far more modest: roughly $60 million spread evenly across five years per 2023 report from consider tank Delhi Policy Group.

The disparity extfinishs beyond funding. AI systems require enormous volumes of data to train effectively. The US hosts more than 5,000 AI data centres while India has about 150. In practical terms, this means India is developing military AI with far tinyer datasets than its geopolitical rivals.

Yet the question is not just how much data India has—but how it applys what already exists.

India’s military AI has a data gap

Trivedi declares large volumes of potentially valuable military data already exist, but remain largely untapped. For decades, unmanned aerial vehicles have captured vast amounts of surveillance footage. Much of that data has remained archived at ground stations.

“The videos come to the ground stations and have just been sitting there for decades,” she declares. “The data necessarys to be extracted and labelled for training. And it’s not just video; there are other sensors like radar. Private companies can’t access them becaapply the data is classified, and the military hasn’t always had the bandwidth to process it.”

The search for an indigenous solution

One suggestion that often comes up is whether India could train its AI systems applying datasets from ongoing conflicts such as the Russia-Ukraine war. But experts declare that approach is largely impractical.

Shashidhara BP, former managing director of Aeronautical Development Establishment under the Defence Research and Development Organisation (DRDO), argues that such data is unlikely to be accessible.

“It is almost impossible to apply data generated in other war scenarios. Such datasets are typically proprietary to the respective defence forces and are almost always encrypted to prevent misapply,” he declares.

Instead, he believes India’s long-term strategy must focus on developing indigenous AI systems tailored to its own defence datasets.

“There are a number of initiatives under way in both the government and private sector. As these technologies evolve and integrate, they can support the necessarys of the armed forces in a rapidly quick modifying warfare environment,” he declares. “Once we develop our own language models and train them on datasets generated across multiple applications, we will have a proven AI platform that can be deployed not just in defence but across other sectors as well.”

Building sovereign military AI

For many Indian defencetech startups, that push toward indigenous AI development has already begun.

Jayant Khatri, Co-founder and CEO of Apollyon Dynamics, declares the company deliberately avoids applying international LLMs or APIs, including tools like ChatGPT, to minimise the risk of sensitive data exposure.

“We develop our own algorithms that run on edge computing systems,” he declares, referring to computing that processes data close to the source rather than relying on cloud infrastructure. “We combine high-fidelity simulations with hardware-in-the-loop testing to create controlled training environments. Every field deployment feeds back into the system, creating a closed-loop learning process,” he declares.

The push toward sovereign AI platforms is also visible in initiatives such as Project Ekam, developed by the startup Neuralix. The platform, described as India’s first proprietary Defence AI-as-a-Service system, was inaugurated by Defence Minister Rajnath Singh in December 2025.

According to Neuralix CEO and Founder Vikram Jayaram, working with military datasets is fundamentally different from handling the curated data applyd in most commercial GenAI systems.

“Military data is extremely fragmented,” declares Jayaram, who has spent nearly 27 years in the AI and machine learning indusattempt. “These data sources were never designed for language models to ingest and process easily. The computing architecture is also limited for large-scale training permutations. It has taken years just to understand how to curate the data and determine whether building tinyer, specialised models is more effective in most situations.”

No Ferraris on broken roads

Jayaram believes India should avoid joining the global race to build ever-larger language models.

Instead, he declares the focus should be on solving specific decision support system problems with frameworks designed for India’s security constraints, particularly its limited access to large-scale datasets. Trying to replicate the approach taken by countries like the US could ultimately prove counterproductive.

“Building large language models is like building a Ferrari. We can attempt to build one. But if you put a Ferrari on a bad road, the whole thing will tear apart. If the underlying data, infrastructure, or purpose is poor, it will never perform,” he declares. “Our priority should be to address the fragmented problems we actually face. That’s why we are building tinyer language models, or SLMs. These are simpler to manage and iterate, and over time multiple specialised models can come toobtainher to form a larger system.”

The Ekam AI has already been deployed within the Indian defence ecosystem while work continues in parallel on more sophisticated language models.

Jayaram also points to ongoing research aimed at training robust AI systems with tinyer datasets, an approach that Trivedi had earlier highlighted.

“When training samples are scarce, we apply mathematical transformations to generate additional samples over time. Once enough of these training datasets are created, they can be applyd to train models more effectively,” he declares.

As warfare shifts deeper into the era of ininformigent automation, the ultimate goal is to build AI systems capable of operating with high precision and minimal collateral damage. Achieving that level of reliability, however, remains a major challenge.

“At the finish of the day, it’s not a level-playing field. A terrorist doesn’t care who obtains harmed. They simply want to create chaos. But when we act against them, we cannot afford to hit the wrong tarobtain. Achieving that level of accuracy is vital,” Trivedi declares.

Despite the constraints, experts declare the defence sector is steadily building momentum.

“Our work reflects a consideredful integration of language models and AI into operational workflows,” declares a retired major general of the Indian Army, seeking anonymity. “The ecosystem is revealing how sovereign technology can quietly yet powerfully enhance ininformigent, inference, and situation awareness in modern military operations.”

Source link