Beijing: Deepsek is looking at the house to suppress its profit.
The Chinese Startup triggered a $ 1 trillion-plus cell-off in the global equities markets last month, making several western contestants better with a cut-pris AI Reasoning Model.
Now, according to three people familiar with the Hangzo-based firm company, the R1 model’s successor is intensifying the launch of the successor.
Deepsek planned to release R2 in early May, but now it wants to exclude as soon as possible, two of them said, without providing nuances.
The company says it hopes that the new model will produce better coding and be able to argue in languages beyond English. Details of the quick timeline for the release of R2 have not been reported earlier.
Deepsek did not respond to the remarks request for this story.
The opponents are still digesting the implications of R1, which was built with less powerful Nvidia chips, but is competitive with people developed by American tech veterans at a cost of hundreds of billions of dollars.
Vijayasimha Alilughatta, Chief Operating Officer of Indian Technical Service Provider Zensor, said, “The launch of Deepsek’s R2 model can be an important moment in the AI industry.” Deepsek’s success in creating cost-affected AI models “will motivate companies worldwide to accelerate their own efforts … strangling some major players in the region,” he said.
The R2 is likely to worry about the US government, which has recognized AI’s leadership as a national priority. Its release can lead Chinese authorities and companies to carry forward, out of which dozens say they have started integrating the Deepsek model in their products.
Little is known about the Deepsek, whose founder Liang Venfeng became a billionaire through his quantitative hedge funds high-flag. Liang, described by a former employer as a “low-key and introverted”, has not spoken to any media since July 2024.
Reuters interviewed a dozen former employees, as well as Quant Fund professionals to know about the operation of Deepsek and its original company. It also reviewed state media articles, social media posts from companies and back research papers in 2019.
He told a story of a company, which worked more as a research laboratory than a profit-profit venture and was unaffected by the hierarchical traditions of China’s high-pressure technical industry, even it is of many investors It was responsible for what many investors have seen as the latest success. A.
Different path
Liang was born in 1985 in a rural village in the southern province of Guangdong. He later received a communication engineering degree at Elite Zhejiang University.
One of his first jobs was running a research department in a smart imaging firm in Shanghai. His then boss, Zhou Choen, told the state media on 9 February that Liang hired the award -winning algorithm engineer and operated with “flat management style”.
In Deepsek and High-Fler, Leiang has shaken the practices of Chinese technical giants in a similar way known for harsh top-down management, low salary for young employees and “996”-6 pm in SAPTAH Work till 9 o’clock from.
Liang opened its Beijing office within the walk of Singua University and Peking University, China, China within the walk. He regularly engaged in technical details and according to two former staff, was happy to work with gene-z interns and recent graduates, including its functional wholesale. He usually described working on eight hours of days in a collaborative atmosphere.
“Liang gave us control and considered us as experts. He constantly asked questions and learned with us, “26 -year -old researcher Benjamin Liu said, who left the company in September. “Deepsek allowed me to owe the important parts of the pipeline, which was very exciting.”
Liang did not answer the questions sent through Deepsak.
While BAIDU and other Chinese technical giants were running to build their consumer-supported versions of Chatgpt in 2023 and were benefiting from the global AI boom, Liang told the Chinese media outlet waves last year that he deliberately app Avoiding heavy spending on the development of, focused instead, focused instead, focused instead. To refine the quality of the AI model.
Both Deepsek and Hi-Flor are known to pay generously, according to three people familiar with its compensation practices. In the high-phire, it is not uncommon for a senior data scientist to make a 1.5 million yuan annually, while the contestants rarely paid more than 800,000, a rival Quant Fund Manager, who knows Liang. Largasi was funded by a high-phire, which became one of the most successful quant funds in China and even after a government action in the region, according to two people in the industry, still manages tens of billions of yuan. Is.
Computing power
The success of Deepsek with low-cost AI models is based on high-flyer-decade-Lumbers and adequate investment in research and computing power, said by three people.
Quant Fund AI was the first leading in AI Trading and a top executive said in 2020 that “All in” was going on high-player AI, which was re-investing 70% revenue, mostly in AI Research .
High-Plyire spent 1.2 billion yuan at two supercomputes AI clusters in 2020 and 2021. The second cluster, Fire-Flire II, was made up of about 10,000 Nvidia A100 chips, which was used for the training of AI models.
The Deepsek was not established at that time, so the accumulation of computing power attracted the attention of the Chinese securities regulators, a person with direct knowledge of the thinking of the authorities.
“The regulator wanted to know why they need so many chips?” The person said. “How were they going to use it? What kind of effect will be on the market? ,
Officials decided not to intervene, in a step that would prove to be important for the fate of Deepsek: America banned the export of A100 chips to China in 2022, at which the Fire-Flair II already in the operation Was.
Beijing now celebrates Deepsek, but according to a person familiar with the Chinese official thinking, has not been attached to the media without approval.
The authorities had asked Leiang to hold a low-profile because they were worried that a lot of publicity in the media would attract unnecessary attention, the person said.
China’s Cabinet and Ministry of Commerce, as well as Chinese’s Securities Regulatory, did not respond to the recommendations of comments.
As one of the few companies with a large A100 cluster, high-flyer and Deepsek were able to attract China’s best research talent, two former employees said. “The main advantage of giant (computing) resources is that it allows for mass use,” said former employee Liu.
Some Western AI entrepreneurs, such as Scale AI CEO Alexandra Wang, claimed that Deepsek had 50,000 high-level Nvidia chips that are banned for export to China. He has not given evidence for the charge or has responded to Reuters’ requests to provide evidence. Deepsek has not responded to Wang’s claims. Two former employees attributed the company’s success to focus on Leiang’s more cost -effective AI architecture.
The startup shows techniques such as mixture-off-experts (MOE) and multihead latent attention (MLA), which reduce low computing costs, showing its research papers.
The MOE technology divides an AI model into different fields of expertise and only activates those belonging to one query, which is contrary to the more common architecture using the entire model.
The MLA architecture allows a model to process different aspects of a piece of information simultaneously, which helps to detect important details more effectively.
While contestants such as Mistral in France have developed MOE -based models, Dipsek was the first firm to depend a lot on this architecture, while attaining equality with a more expensively produced model.
The pricing of Deepsek was cheaper by 20 to 40 times cheaper, which was charged for the equivalent model by OpenaiI, analyzed by analysts of Bernstein Brokerage in early February.
For now, the western and Chinese tech veterans have indicated a plan to continue heavy AI spending, but Deepsek’s success with R1 and its earlier V3 models has inspired to change some strategies. Openai cut the prices of this month, while Gemini of Google has access concessional levels. Since the launch of the R1, Openai has also released an O3-mini model that depends on low computing power.
US Tech Services provider Adnan Masood of the UST told the Reuters that his laboratory had run benchmarks, which found that R1 often used units of data processed by R1 three times, or AI model, or scales of Openai’s scale-down To argue as a model.
Embrace the state
Even before R1 attracted global attention, there were indications that Deepsek had caught Beijing’s side. In January, State Media reported that Liang attended a meeting as a nominated representative of the AI sector with Chinese Premier Lee Kiang in Beijing, ahead of leaders of better known firms.
Later pomp on the cost competition of its models has instigated Beijing’s confidence that it can exclude America, Chinese companies and government bodies have not introduced the Deepsek model to other firms by embracing the Deepsek model at a speed.
At least 13 Chinese city governments and 10 state -owned energy companies say they have deployed Deepsac in their system, while Tech Giant Lenovo, Badu and Tensent – Owner of China’s biggest social media app WeChat – Has integrated the model of Deepsek in its products.
“Chinese leaders Xi Jinping and Lee said that they support Deepsek,” said Alfred Wu said at Singapore’s Lee Kuan You School of Public Policy. “Now everyone just supports it.”
Chinese embrace comes in the form of governments from South Korea to Italy, which removes the lampsac from the National App Store, citing the concerns of privacy.
“If Dipsec becomes a Go-to AI model in Chinese state institutions, Western regulator can see it as another reason to increase sanctions on AI chips or software cooperation,” Stephen Wu, AI expert and hedge funds Karthaz Capital founder Stephen Wu said.
The front boundary on advanced AI chips is a challenge that Liang has accepted.
“Our problem has never been funded,” he explained the waves in July. “This is embarrago on high end chips.”