Highlights
Sarvam-M: India’s Ambitious Large Language Model
Indian AI startup Sarvam has introduced its flagship large language model (LLM), Sarvam-M. This 24-billion-parameter hybrid open-weights model is built on the Mistral Small framework. As a versatile and locally relevant alternative in the competitive global LLM landscape, Sarvam-M has garnered attention for its impressive performance in Indian languages, mathematics, and programming, although it has faced some scepticism from certain sectors of the tech community.
Understanding 24 Billion Parameters
When discussing parameters, it is essential to understand that they are the internal settings a language model employs to process and generate text. These can be compared to dials and switches that get calibrated during training to enhance the model’s understanding of grammar, context, facts, reasoning, and more. The number of parameters is crucial; the more parameters a model contains, the more refined its understanding and outputs can be, although this is also influenced by the quality of the training data and methods used. Sarvam-M, with its 24 billion parameters, is characterized as a mid-to-large scale model. It is considerably larger than open models like Mistral 7B but smaller than leading systems such as OpenAI’s GPT-4 or Google’s Gemini 1.5 Pro.
Comparing Sarvam-M with Leading Models
Here’s a snapshot of Sarvam-M’s position relative to other prominent models:
Model | Parameters | Strengths |
---|---|---|
Sarvam-M | 24B | Indian languages, maths, programming |
OpenAI GPT-4 | 1.8T (estimated) | General reasoning, coding, multilingual |
Gemini 1.5 Pro | 200B+ | Multimodal capabilities, advanced reasoning and coding performance |
Llama 3 70B | 70B | Reasoning, coding, and multilingual tasks |
Anthropic Claude 3.7 Sonnet | 2T (estimated) | High-quality summarisation, reasoning, and content generation |
Sarvam-M ranks below the largest proprietary models in size but excels in specific areas, especially in mathematics and reasoning in Indian languages. However, it falls behind in English-focused benchmarks such as MMLU, showing about a 1% performance gap and indicating the need for enhancement in broader linguistic generalisation.
The Development Process of Sarvam-M
The creation of Sarvam-M involved a three-phase training approach:
- Supervised Fine-Tuning (SFT): This phase utilized high-quality prompts and responses to develop the model’s conversational and reasoning skills while reducing cultural bias.
- Reinforcement Learning with Verifiable Rewards (RLVR): The model learned to follow instructions and resolve logic-heavy challenges through strategically designed rewards and feedback mechanisms.
- Inference Optimisation: Advanced compression techniques, such as FP8 quantisation, and improved decoding strategies enhanced efficiency and speed, though scalability challenges in high-concurrency settings remain.
Significance of Sarvam-M in the AI Landscape
Sarvam-M supports ten Indian languages and is capable of addressing competitive exam questions in Hindi, positioning it as a valuable tool for local education and translation initiatives. The model demonstrated an 86% improvement in a test integrating mathematics and romanised Indian languages, proving its robust multilingual reasoning capacity.
While there have been questions regarding whether Sarvam-M is “good enough” to compete on a global scale, its launch has notably elevated the profile of Indian contributions in the AI domain. The model is now available to the public through Sarvam’s API and on Hugging Face, allowing developers to create, test, and contribute further advancements.
Although it may not yet match the most sophisticated LLMs, Sarvam-M signifies a meaningful advancement in the effort to democratise AI development in India, making strides for users requiring support beyond just English.