Meet DeepSeek: the Chinese start-up that is changing how AI models are trainedHangzhou-based DeepSeek is 2025’s ‘biggest dark horse’ in open-source large language models, Nvidia research scientist Jim Fan says
Ben Jiangin BeijingandBien Perezin Hong Kong
Published: 9:00pm, 1 Jan 2025
Chinese start-up DeepSeek has emerged as “the biggest dark horse” in the open-source large language model (LLM) arena in 2025, just days after the firm made waves in the global artificial intelligence (AI) community with its latest release.
That assessment came from Jim Fan, a senior research scientist at Nvidia and lead of its AI Agents Initiative, in a New Year’s Day post on social-media platform X, following the Hangzhou-based start-up’s release last week of its namesake LLM, DeepSeek V3.
“[The new AI model] shows that resource constraints force you to reinvent yourself in spectacular ways,” Fan wrote, referring to how DeepSeek developed the product at a fraction of the capital outlay that other tech companies invest in building LLMs.
DeepSeek V3 comes with 671 billion parameters and was trained in around two months at a cost of US$5.58 million, using significantly fewer computing resources than models developed by bigger tech firms such as Facebook parent Meta Platforms and ChatGPT creator OpenAI.
LLM refers to the technology underpinning generative AI services such as ChatGPT. In AI, a high number of parameters is pivotal in enabling an LLM to adapt to more complex data patterns and make precise predictions. Open source gives public access to a software program’s source code, allowing third-party developers to modify or share its design, fix broken links or scale up its capabilities.
DeepSeek’s development of a powerful LLM at less cost than what bigger companies spend shows how far Chinese AI firms have progressed, despite US sanctions that have largely blocked their access to advanced semiconductors used for training models.
Leveraging new architecture designed to achieve cost-effective training, DeepSeek required just 2.78 million GPU hours – the total amount of time that a graphics processing unit is used to train an LLM – for its V3 model. DeepSeek’s training process used Nvidia’s China-tailored H800 GPUs, according to the start-up’s technical report posted on December 26, when V3 was released.
That process was substantially less than the 30.8 million GPU hours that Meta needed to train its Llama 3.1 model on Nvidia’s more advanced H100 chips, which are not allowed to be exported to China
“DeepSeek V3 looks to be a stronger model at only 2.8 million GPU hours,” computer scientist Andrej Karpathy – a founding team member at OpenAI – said in his X post on December 27.
Karpathy’s observation prompted Fan to respond on the same day in a post on X: “Resource constraints are a beautiful thing. Survival instinct in a cutthroat AI competitive land is a prime driver for breakthroughs.”
“I’ve been following DeepSeek for a long time. They had one of the best open coding models last year,” Fan wrote. “Superior OSS [open-source software] models put huge pressure on commercial, frontier LLM companies to move faster.”
The founder of cloud computing start-up Lepton AI, Jia Yangqing, echoed Fan’s perspective in an X post on December 27. “It is simple intelligence and pragmatism at work: given a limit of computation and manpower present, produce the best outcome with smart research,” wrote Jia, who previously served as a vice-president at Alibaba Group Holding, owner of the South China Morning Post.
DeepSeek did not immediately respond to a request for comment.
The start-up was reportedly spun off in 2023 by hedge-fund manager High Flyer Quant. The person behind DeepSeek is High-Flyer Quant founder Liang Wenfeng, who had studied AI at Zhejiang University.
In an interview with Chinese online media outlet 36Kr in May 2023, Liang said High-Flyer Quant had already bought more than 10,000 GPUs before the US government imposed AI chip restrictions on China. That investment laid the foundation for DeepSeek to operate as an LLM developer. Liang said DeepSeek also receives funding support from High-Flyer Quant.
Most developers at DeepSeek are either fresh graduates, or people early in their AI career, following the company’s preference for ability more than experience in recruiting new employees.
DeepSeek’s V3 model, however, has also stirred some controversy because it had mistakenly identified itself as OpenAI’s ChatGPT on certain occasions.
Lucas Beyer, a researcher at Microsoft-backed OpenAI, said in an X post last Friday that DeepSeek V3’s misidentification was prompted by this simple question: “What model are you?”
Still, V3 is not the first AI model struck by identity confusion. Machine-learning expert Aakash Kumar Nain wrote in a post on X that it was common a mistake made across various AI models because “a lot of data available on the internet has already been GPT-contaminated”.
A group of researchers from China’s Shandong University and Drexel University and Northeastern University in the US echoed Nain’s view. Out of 27 AI models these researchers tested, they found that a quarter exhibited identity confusion, which “primarily stems from hallucinations rather than reuse or replication”.
As of Tuesday, DeepSeek’s V1 LLM was still ranked as the most popular AI model on Hugging Face, the world’s largest online machine-learning and open-source AI community.
https://www.scmp.com/tech/tech-trends/article/3293050/meet-deepseek-chinese-start-changing-how-ai-models-are-trained
Jim Fan, a senior research scientist at semiconductor design giant Nvidia, says he has been closely following developments at artificial intelligence start-up DeepSeek. Photo: SCMP
Keep an eye on this story of DeepSeek as it is an important story in the AI competition.
As we see again, only two countries racing ahead in new technology, America and China. Space, internet platforms, weapons, it is the same two countries. Now this competition is intensifying too in the realm of AI of Large Language Models.
The point here about DeepSeek, it was able to do better and outperform its American competitors on some benchmarks.
Yet, they did not have the best hardware, and virtually no funding.
The American companies spent billions and billions and billions and had all the latest NAVIDA graphics cards with the best semiconductor chips, and they still got beat.
They got beat by a Chinese company that did not have any of those things.
Goes to show what they say is true about sports. Like hockey or football, the game is played on the ice, the game is not played on paper.
The American company, and the Biden administration, thought they would win this LLM game with their capital spending and sanctions against Chinese companies. If the Americans won with their LLM, then they could form a monopoly on LLM and make the world use their LLM, for a price.
DeepSeek is open source.
Kekeke ...
Billions and billions of dollars were spent to develop those American LLM.
Wonder if they hired any Indian coders.