Chinese companies persist in launching AI models that compete with the capabilities of systems developed by OpenAI and various U.S.-based AI enterprises.
This week, MiniMax, a startup backed by Alibaba and Tencent, which has secured approximately $850 million in venture capital and holds a valuation exceeding $2.5 billion, unveiled three innovative models: MiniMax-Text-01, MiniMax-VL-01, and T2A-01-HD. MiniMax-Text-01 is a model focused solely on text, while MiniMax-VL-01 is capable of understanding both images and text. Meanwhile, T2A-01-HD generates audio, specifically speech.
MiniMax asserts that MiniMax-Text-01, which comprises 456 billion parameters, outperforms models such as Google’s recently introduced Gemini 2.0 Flash on evaluation metrics like MATH and SimpleQA. These benchmarks gauge a model’s proficiency in addressing mathematical inquiries and factual questions. Parameters are indicative of a model’s problem-solving abilities, with models having a greater number of parameters typically exhibiting superior performance compared to their fewer-parameter counterparts.
With respect to MiniMax-VL-01, MiniMax claims it competes with Anthropic’s Claude 3.5 Sonnet in assessments requiring multimodal comprehension, such as ChartQA. This tests models on their ability to respond to graph and diagram-related inquiries (for instance, “What is the peak value of the orange line in this graph?”). Although MiniMax-VL-01 does not consistently outperform Gemini 2.0 Flash in many assessments, it does face competition from OpenAI’s GPT-4o and Meta’s Llama 3.1, which surpass it in several instances.
Notably, MiniMax-Text-01 features an exceptionally large context window. A model’s context window refers to the input (such as text) that the model considers prior to generating output. With a context window accommodating 4 million tokens, MiniMax-Text-01 can evaluate approximately 3 million words at once — equivalent to just over five versions of “War and Peace.”
For clarification, MiniMax-Text-01’s context window is around 31 times greater than that of GPT-4o and Llama 3.1.
The final model introduced this week by MiniMax, T2A-01-HD, is an audio generator tailored for speech synthesis. T2A-01-HD can create a synthetic voice with adjustable pace, tone, and quality in about 17 different languages, including English and Chinese, and can replicate a voice from merely 10 seconds of audio input.
MiniMax has not published benchmark comparisons of T2A-01-HD with other audio generation models. However, the reporter notes that the outputs from T2A-01-HD sound comparable to those produced by audio models from Meta and firms such as PlayAI.
With the exception of T2A-01-HD, which is exclusively available via MiniMax’s API and Hailuo AI platform, MiniMax’s new models can be accessed for download on GitHub and the AI development platform Hugging Face.
Despite being described as “openly” accessible, certain features remain restricted. MiniMax-Text-01 and MiniMax-VL-01 are not genuinely open source, as MiniMax has not disclosed the necessary components (such as training data) to allow developers to recreate them from the ground up. Additionally, they are governed by MiniMax’s restrictive licensing, which prevents developers from utilising the models to enhance competitive AI systems, and imposes a requirement for platforms boasting more than 100 million monthly active users to procure a special licence from MiniMax.
Founded in 2021 by former staff of SenseTime, one of China’s leading AI companies, MiniMax’s portfolio includes applications like Talkie. This is an AI-driven role-playing platform akin to Character AI and text-to-video models that MiniMax has made available through Hailuo.
Certain offerings from MiniMax have sparked minor controversies.
Talkie was removed from Apple’s App Store in December for unspecified “technical” issues and features AI avatars representing public figures such as Donald Trump, Taylor Swift, Elon Musk, and LeBron James, none of whom appear to have provided consent for their inclusion in the application.
In December, Broadcast magazine reported that MiniMax’s video generators could replicate the logos of British television channels, implying that MiniMax’s models were trained on content from those outlets. Furthermore, MiniMax is reportedly facing legal action from iQIYI, a Chinese streaming service alleging that MiniMax unlawfully used iQIYI’s copyrighted materials during its training processes.
The introduction of MiniMax’s latest models comes just days after the outgoing Biden Administration suggested stricter export regulations and limitations on AI technologies pertaining to Chinese enterprises. Chinese companies were already restricted from acquiring advanced AI chips, but should the new regulations be enacted as proposed, these companies will likely encounter more stringent limitations on both the semiconductor technology and models necessary for developing sophisticated AI systems.
On Wednesday, the Biden Administration announced further measures aimed at preventing the transfer of advanced chips to China. Chip manufacturers and packaging companies wishing to export specific chips will be subject to broader licensing requirements unless they implement more rigorous oversight and due diligence to prevent their products from reaching clients in China.