“DeepSeek’s achievements may not be quite as impressive as headlines imply. For starters, the new models have shortcomings. One of the key reasons they perform well despite limited access to advanced chips is due to their ‘Mixture of Experts’ approach. This means available computing power is concentrated on a few ‘expert’ tasks, while less-critical tasks may be undertrained. The models thus excel in certain areas, but their overall performance is less consistent than some rivals. For instance, one Chinese AI expert has noted that one of the models performs well on math and coding tests, but correctly answered only about half of some other classic AI test questions. In short, the models are specialists adapted to become generalists.
“The US$5.6mn price tag should also not be taken too literally. One reason DeepSeek’s costs are low is that the company’s offerings have so far focused only on text-based large-language models, while some US rivals offer multimodal models that can handle images and videos, making direct cost comparisons misleading. Another is that the company can piggyback on the costly earlier advances made and lessons learned by US AI firms. Finally, the much-cited cost figure doesn’t account for prior research and development spending. DeepSeek’s parent company reportedly had an initial research and development budget of around RMB3bn, as well as a stockpile of about 10,000 of Nvidia’s advanced A100 chips, meaning the actual cost of development was almost certainly much higher.
“Nonetheless, the company achieved an unambiguous innovation in software architecture that allowed it to deliver strong performance on many tasks at a low cost. That reflects a broader strategy among Chinese technology firms in response to US export controls: using software to get more out of less-advanced hardware. A 2023 review of Tencent’s Hunyuan AI model by the Berkeley AI Research Lab, for instance, concluded that ‘[s]oftware advancements are making old hardware increasingly useful.’”
No comments:
Post a Comment