Full Title: Tumbling Training Costs, Desktop AI Supercomputer, Tighter AI Export Restrictions, Improved Contrastive Loss
Highlights
Writing software, especially prototypes, is becoming cheaper. This will lead to increased demand for people who can decide what to build. AI Product Management has a bright future! (View Highlight)
Software is often written by teams that comprise Product Managers (PMs), who decide what to build (such as what features to implement for what users) and Software Developers, who write the code to build the product. Economics shows that when two goods are complements — such as cars (with internal-combustion engines) and gasoline — falling prices in one leads to higher demand for the other. For example, as cars became cheaper, more people bought them, which led to increased demand for gas. Something similar will happen in software. Given a clear specification for what to build, AI is making the building itself much faster and cheaper. This will significantly increase demand for people who can come up with clear specs for valuable things to build. (View Highlight)
This is why I’m excited about the future of Product Management, the discipline of developing and managing software products. I’m especially excited about the future of AI Product Management, the discipline of developing and managing AI software products. (View Highlight)
Many companies have an Engineer:PM ratio of, say, 6:1. (The ratio varies widely by company and industry, and anywhere from 4:1 to 10:1 is typical.) As coding becomes more efficient, I think teams will need more product management work (as well as design work) as a fraction of the total workforce. Perhaps engineers will step in to do some of this work, but if it remains the purview of specialized Product Managers, then the demand for these roles will grow. (View Highlight)
This change in the composition of software development teams is not yet moving forward at full speed. One major force slowing this shift, particularly in AI Product Management, is that Software Engineers, being technical, are understanding and embracing AI much faster than Product Managers. Even today, most companies have difficulty finding people who know how to develop products and also understand AI, and I expect this shortage to grow. (View Highlight)
Further, AI Product Management requires a different set of skills than traditional software Product Management. It requires:
• Technical proficiency in AI. PMs need to understand what products might be technically feasible to build. They also need to understand the lifecycle of AI projects, such as data collection, building, then monitoring, and maintenance of AI models.
• Iterative development. Because AI development is much more iterative than traditional software and requires more course corrections along the way, PMs need to understand how to manage such a process.
• Data proficiency. AI products often learn from data, and they can be designed to generate richer forms of data than traditional software.
• Skill in managing ambiguity. Because AI’s performance is hard to predict in advance, PMs need to be comfortable with this and have tactics to manage it.
• Ongoing learning. AI technology is advancing rapidly. PMs, like everyone else who aims to make best use of the technology, need to keep up with the latest technology advances, product ideas, and how they fit into users’ lives. (View Highlight)
Finally, AI Product Managers will need to know how to ensure that AI is implemented responsibly (for example, when we need to implement guardrails to prevent bad outcomes), and also be skilled at gathering feedback fast to keep projects moving. Increasingly, I also expect strong product managers to be able to build prototypes for themselves . (View Highlight)
The demand for good AI Product Managers will be huge. In addition to growing AI Product Management as a discipline, perhaps some engineers will also end up doing more product management work. (View Highlight)
New highlights added January 16, 2025 at 3:29 PM
DeepSeek-V3 is an open large language model that outperforms Llama 3.1 405B and GPT-4o on key benchmarks and achieves exceptional scores in coding and math. The weights are open except for applications that involve military uses, harming minors, generating false information, and similar restrictions. You can download them here. (View Highlight)
DeepSeek-V3 is a mixture-of-experts (MoE) transformer that comprises 671 billion parameters, of which 37 billion are active at any moment. The team trained the model in 2.79 million GPU hours — less than 1/10 the time required to train Llama 3.1 405B, which DeepSeek-V3 substantially outperforms — at an extraordinarily low cost of $5.6 million. (View Highlight)
In DeepSeek’s tests, DeepSeek-V3 outperformed Llama 3.1 405B and Qwen 2.5 72B across the board, and its performance compared favorably with that of GPT-4o.
• DeepSeek-V3 showed exceptional performance in coding and math tasks. In coding, DeepSeek-V3 dominated in five of the seven benchmarks tested. However, DeepSeek-V3 lost to o1 on one of the five, according to a public leaderboard. Specifically, on Polyglot, which tests a model’s ability to generate code in response to difficult requests in multiple programming languages, DeepSeek-V3 (48.5 percent accuracy) beat Claude Sonnet 3.5 (45.3 percent accuracy), though it lost to o1 (61.7 percent accuracy).
• In language tasks, it performed neck-and-neck with Claude 3.5 Sonnet, achieving higher scores in some tasks and lower in others. (View Highlight)
OpenAI’s o1 models excel thanks to agentic workflows in which they reflect on their own outputs, use tools, and so on. DeepSeek swims against the tide and achieves superior results without relying on agentic workflows. (View Highlight)
Open models continue to challenge closed models, giving developers high-quality options that they can modify and deploy at will. But the larger story is DeepSeek-V3’s shockingly low training cost. The team doesn’t explain precisely how the model achieves outstanding performance with such a low processing budget. (The paper credits “meticulous engineering optimizations.”) But it’s likely that DeepSeek’s steady refinement of MoE is a key factor. DeepSeek-V2, also an MoE model, saved more than 40 percent in training versus the earlier DeepSeek 67B, which didn’t employ MoE. In 2022, Microsoft found that MoE cost five times less in training for equal performance compared to a dense model, and Google and Meta reported that MoE achieved better performance than dense models trained on the same numbers of tokens. (View Highlight)
If they can be replicated, DeepSeek’s results have significant implications for the economics of training foundation models. If indeed it now costs around $5 million to build a GPT-4o-level model, more teams will be able to train such models, and the cost of competing with the AI giants could fall dramatically. (View Highlight)
The United States proposed limits on exports of AI technology that would dramatically expand previous restrictions, creating a new international hierarchy for access to advanced chips and models. (View Highlight)
The Biden administration, which will transition to leadership under incoming President Trump next week, issued new rules that restrict exports of AI chips and models to most countries beyond a select group of close allies. The rules, which are not yet final, would create a three-tier system that limits exports to a number of close allies and blocks access entirely to China, Iran, North Korea, Russia, and others. They also would introduce the U.S.’ first-ever restrictions on exporting closed weights for large AI models. (View Highlight)
The restrictions were announced shortly after a leak reached the press. A public comment period of 120 days will enable the incoming U.S. Presidential administration to consider input from the business and diplomatic communities and modify the rules before they take effect. The rules are scheduled to take effect in one year. (View Highlight)
• A new hierarchy divides nations into three groups that would have different degrees of access to AI chips both designed in the U.S. and manufactured abroad using U.S. technology, as well as proprietary AI models.
• Tier 1: Australia, Japan, Taiwan, the United Kingdom, and most of Europe would retain nearly unrestricted access. However, these nations must keep 75 percent of their AI computing power within allied countries. No more than 10 percent can be transferred to any single country outside this group to ensure that advanced AI development remains concentrated among close U.S. allies.
• Tier 2: Traditional U.S. allies and trade partners like Israel, Saudi Arabia, and Singapore face an initial cap of 507 million units of total processing power (TPP) — roughly the computational capacity of 32,000 Nvidia H100 chips — through the first quarter of 2025. The cap would increase to 1.02 billion TPP by 2027. U.S. companies that operate in these countries can apply for higher limits: 633 million TPP in Q1 2025, rising to 5.064 billion TPP by Q1 2027.
• Tier 3: China, Russia, and around two dozen other countries are blocked from receiving advanced AI chips, model weights, and specialized knowledge related to these systems. (View Highlight)
The proposed rules build on 2022’s CHIPS and Science Act, which was designed to strengthen domestic semiconductor production and restrict technologies abroad that could bear on U.S. security. An initial round of restrictions in late 2022 barred semiconductor suppliers AMD and Nvidia from selling advanced chips to Chinese firms. In November 2024, the U.S. tightened restrictions further, ordering Taiwan Semiconductor Manufacturing Company, which fabricates those chips, to halt production of advanced chips destined for China. (View Highlight)
In addition, President Biden issued an executive order to encourage the rapid build-out of computing infrastructure for AI. The federal government will hold competitions among private companies to lease sites it owns for the building of data centers at private expense. The selection of sites will take into account availability of sources of clean energy, including support for nuclear energy. The government will expedite permitting on these sites and support development of energy transmission lines around them. It will also encourage international allies to invest in AI infrastructure powered by clean energy. (View Highlight)
Protecting the United States’ advantages in high tech has been a rising priority for the White House over the past decade. The earlier export restrictions forced many Chinese AI developers to rely on less-powerful hardware. The new limits are likely to have a far broader impact. They could force developers in the Tier 2 and Tier 3 countries to build less resource-intensive models and lead them to collaborate more closely with each other, reducing the value of U.S.-made technology worldwide. They could hurt U.S. chip vendors, which have warned that the rules could weaken U.S. competitiveness in the global economy. They could also force companies that are building huge data centers to process AI calculations to reconsider their plans. (View Highlight)
The Biden administration’s embargo on AI chips has been leaky. So far, it has slowed down adversaries only slightly while spurring significant investment in potential suppliers that aren’t connected to the U.S. While the public comment period invites lobbying and industry feedback, ultimately geopolitical priorities may hold sway. Whatever the outcome, reducing the world’s dependence on U.S. chips and models would result in a very different global AI ecosystem. (View Highlight)