Revolutionizing AI: Sarvam M Shines in Mathematics and Indian Languages, Outclassing Competitors

Revolutionizing AI: Sarvam M Shines in Mathematics and Indian Languages, Outclassing Competitors

Sarvam-M: India’s Ambitious Large Language Model

Indian AI startup Sarvam has introduced its flagship large language model (LLM), Sarvam-M. This 24-billion-parameter hybrid open-weights model is built on the Mistral Small framework. As a versatile and locally relevant alternative in the competitive global LLM landscape, Sarvam-M has garnered attention for its impressive performance in Indian languages, mathematics, and programming, although it has faced some scepticism from certain sectors of the tech community.

Understanding 24 Billion Parameters

When discussing parameters, it is essential to understand that they are the internal settings a language model employs to process and generate text. These can be compared to dials and switches that get calibrated during training to enhance the model’s understanding of grammar, context, facts, reasoning, and more. The number of parameters is crucial; the more parameters a model contains, the more refined its understanding and outputs can be, although this is also influenced by the quality of the training data and methods used. Sarvam-M, with its 24 billion parameters, is characterized as a mid-to-large scale model. It is considerably larger than open models like Mistral 7B but smaller than leading systems such as OpenAI’s GPT-4 or Google’s Gemini 1.5 Pro.

Comparing Sarvam-M with Leading Models

Here’s a snapshot of Sarvam-M’s position relative to other prominent models:

Model Parameters Strengths
Sarvam-M 24B Indian languages, maths, programming
OpenAI GPT-4 1.8T (estimated) General reasoning, coding, multilingual
Gemini 1.5 Pro 200B+ Multimodal capabilities, advanced reasoning and coding performance
Llama 3 70B 70B Reasoning, coding, and multilingual tasks
Anthropic Claude 3.7 Sonnet 2T (estimated) High-quality summarisation, reasoning, and content generation

Sarvam-M ranks below the largest proprietary models in size but excels in specific areas, especially in mathematics and reasoning in Indian languages. However, it falls behind in English-focused benchmarks such as MMLU, showing about a 1% performance gap and indicating the need for enhancement in broader linguistic generalisation.

The Development Process of Sarvam-M

The creation of Sarvam-M involved a three-phase training approach:

Significance of Sarvam-M in the AI Landscape

Sarvam-M supports ten Indian languages and is capable of addressing competitive exam questions in Hindi, positioning it as a valuable tool for local education and translation initiatives. The model demonstrated an 86% improvement in a test integrating mathematics and romanised Indian languages, proving its robust multilingual reasoning capacity.

While there have been questions regarding whether Sarvam-M is “good enough” to compete on a global scale, its launch has notably elevated the profile of Indian contributions in the AI domain. The model is now available to the public through Sarvam’s API and on Hugging Face, allowing developers to create, test, and contribute further advancements.

Although it may not yet match the most sophisticated LLMs, Sarvam-M signifies a meaningful advancement in the effort to democratise AI development in India, making strides for users requiring support beyond just English.

Exit mobile version