DeepSeek to release long-awaited AI model in new challenge to US rivals

Unlock the Editor’s Digest for free

DeepSeek is set to release its latest large language model next week, more than a year since its last major release in a fresh test of China’s ambitions to challenge US rivals in AI.

The Hangzhou-based lab plans to unveil V4, a “multimodal” model with picture, video and text-generating functions, according to two people familiar with the matter.

They said DeepSeek had worked with Chinese AI chipmakers Huawei and Cambricon to optimise V4 for their latest products, according to two people with knowledge of those arrangements.

That move signals broader Chinese efforts to avoid reliance on Nvidia’s market-leading AI chips that are subject to Washington export controls designed to curb the country’s technological rise.

DeepSeek’s new release is timed ahead of next week’s annual parliamentary “Two Sessions” meetings, which start on March 4. The high-profile political gathering could further cement DeepSeek’s status as a national AI champion.

It will be the first major model launch by DeepSeek since January 2025, when it unveiled its R1 reasoning model. The company claimed to have built a system comparable to leading Silicon Valley models using only a fraction of the computing power.

That move sent shockwaves through US tech stocks, which some experts described as a “Sputnik” moment that signalled China’s rapid advance as an AI power.

Since then, DeepSeek has issued incremental updates rather than a full new model launch, allowing domestic rivals including Alibaba and Moonshot to capture demand for low-cost, open-source Chinese models.

DeepSeek’s effort to optimise V4 for Chinese-made chips is expected to bolster local demand for its semiconductors and accelerate the transition away from US chipmakers Nvidia and AMD for “inference” — generating responses from a trained model.

Reuters was first to report on DeepSeek’s work with Huawei and Cambricon.

DeepSeek has not worked with Nvidia to optimise its model for its products, according to another person with knowledge of the matter.

Nvidia continues to dominate the market for training chips, particularly for the computationally intensive pre-training phase in which models ingest vast amounts of data.

The FT previously reported that DeepSeek had attempted to carry out this initial training on Huawei hardware but encountered technical difficulties.

Last year’s R1 release was published alongside a detailed technical report on DeepSeek’s engineering techniques that used Nvidia chips more efficiently to train and run its model.

DeepSeek was praised for sharing its training methods for developing a “reasoning model”, which then allowed other labs to study and implement their findings. Reasoning models are designed to solve complex problems by breaking them up into smaller steps.

DeepSeek is expected to publish a shorter technical note alongside V4 next week, followed by a more comprehensive report about a month later, according to a person with direct knowledge of the plans.

Earlier in the week, Anthropic accused DeepSeek and two other Chinese AI labs of “distillation attacks” on its models, a practice of training smaller models on the outputs of more advanced systems, allowing them to replicate the US company’s performance without using the same computing resources.

Huawei, DeepSeek and Cambricon did not respond to requests for comment.

Financial Times

Related posts

Leave a Comment