Elon Musk -owned Grok has reportedly beaten China's DeepSeek AI chatbot. According to a report by Counterpoint Research, Elon Musk's xAI unveiled Grok-3 , its most advanced model to date, which slightly outperforms DeepSeek-R1 , OpenAI’s GPT-o1 and Google’s Gemini 2 . Unlike DeepSeek-R1, Grok-3 is proprietary and was trained using a staggering ~200,000 H100 GPUs on xAI’s supercomputer Colossus, representing a giant leap in computational scale. Incidentally, Grok-3 is the same model that Elon Musk said February 2025 that he is going offline for. "Will be honing product with the team all weekend, so offline until then," Elon Musk has posted.
Since February, DeepSeek has grabbed global headlines by open-sourcing its flagship reasoning model DeepSeek-R1 to deliver performance on a par with the world’s frontier reasoning models. What sets it apart isn’t just its elite capabilities, but the fact that it was trained using only ~2,000 NVIDIA H800 GPUs — a scaled-down, export-compliant alternative to the H100, making its achievement a masterclass in efficiency.
Grok-3 represents scale without compromise -- 200,000 NVIDIA H100s chasing frontier gains, while DeepSeek-R1 delivers similar performance using a fraction of the compute, signalling that innovative architecture and curation can rival brute force, according to Counterpoint Research.
Musk’s xAI has unveiled Grok-3, its most advanced model to date, which slightly outperforms DeepSeek-R1, OpenAI’s GPT-o1 and Google’s Gemini 2. “Unlike DeepSeek-R1, Grok-3 is proprietary and was trained using a staggering 200,000 H100 GPUs on xAI’s supercomputer Colossus, representing a giant leap in computational scale,” said Sun.
Grok-3 embodies the brute-force strategy — massive compute scale (representing billions of dollars in GPU costs) driving incremental performance gains. It’s a route only the wealthiest tech giants or governments can realistically pursue.
“In contrast, DeepSeek-R1 demonstrates the power of algorithmic ingenuity by leveraging techniques like Mixture-of-Experts (MoE) and reinforcement learning for reasoning, combined with curated and high-quality data, to achieve comparable results with a fraction of the compute,” explained Sun.
Grok-3 proves that throwing 100x more GPUs can yield marginal performance gains rapidly. But it also highlights rapidly diminishing returns on investment (ROI), as most real-world users see minimal benefit from incremental improvements. In essence, DeepSeek-R1 is reported to be about achieving elite performance with minimal hardware overhead, while Grok-3 is about pushing boundaries by any computational means necessary, said the report.
Will be honing product with the team all weekend, so offline until then
— Elon Musk (@elonmusk) February 16, 2025
Since February, DeepSeek has grabbed global headlines by open-sourcing its flagship reasoning model DeepSeek-R1 to deliver performance on a par with the world’s frontier reasoning models. What sets it apart isn’t just its elite capabilities, but the fact that it was trained using only ~2,000 NVIDIA H800 GPUs — a scaled-down, export-compliant alternative to the H100, making its achievement a masterclass in efficiency.
Grok-3 represents scale without compromise -- 200,000 NVIDIA H100s chasing frontier gains, while DeepSeek-R1 delivers similar performance using a fraction of the compute, signalling that innovative architecture and curation can rival brute force, according to Counterpoint Research.
Musk’s xAI has unveiled Grok-3, its most advanced model to date, which slightly outperforms DeepSeek-R1, OpenAI’s GPT-o1 and Google’s Gemini 2. “Unlike DeepSeek-R1, Grok-3 is proprietary and was trained using a staggering 200,000 H100 GPUs on xAI’s supercomputer Colossus, representing a giant leap in computational scale,” said Sun.
Grok-3 embodies the brute-force strategy — massive compute scale (representing billions of dollars in GPU costs) driving incremental performance gains. It’s a route only the wealthiest tech giants or governments can realistically pursue.
“In contrast, DeepSeek-R1 demonstrates the power of algorithmic ingenuity by leveraging techniques like Mixture-of-Experts (MoE) and reinforcement learning for reasoning, combined with curated and high-quality data, to achieve comparable results with a fraction of the compute,” explained Sun.
Grok-3 proves that throwing 100x more GPUs can yield marginal performance gains rapidly. But it also highlights rapidly diminishing returns on investment (ROI), as most real-world users see minimal benefit from incremental improvements. In essence, DeepSeek-R1 is reported to be about achieving elite performance with minimal hardware overhead, while Grok-3 is about pushing boundaries by any computational means necessary, said the report.
You may also like
Trump's Nasa chief Jared Isaacman says will prioritise Mars missions but won't forget moon
'Was making a choice': Michelle on divorce rumours with Barack Obama
IIM Ahmedabad goes global: First overseas campus to open in Dubai by 2025
Tahawwur Rana extradited: What happens after 26/11 terror accused lands in India
USCIS goes full throttle: Perceived antisemitic content will result in visa and green card denials