rw-book-cover

Metadata

Highlights

  • Smaller Models Get Better In 2022, the smallest model registering a score higher than 60% on the Massive Multitask Language Understanding (MMLU) benchmark was PaLM, with 540 billion parameters. By 2024, Microsoft’s Phi-3-mini, with just 3.8 billion parameters, achieved the same threshold. This represents a 142-fold reduction in over two years. (View Highlight)
  • Models Become Cheaper to Use The cost of querying an AI model that scores the equivalent of GPT-3.5 (64.8% accuracy) on MMLU dropped from 0.07 per million tokens by October 2024 (Gemini-1.5-Flash-8B)—a more than 280-fold reduction in approximately 18 months. Depending on the task, LLM inference prices have fallen anywhere from 9 to 900 times per year. (View Highlight)
  • China’s Models Catch Up The U.S. still leads in producing top AI models—but China is closing the performance gap. In 2024, U.S.-based institutions produced 40 notable AI models, compared to China’s 15 and Europe’s three. While the U.S. maintains its lead in quantity, Chinese models have rapidly closed the quality gap: performance differences on major benchmarks such as MMLU and HumanEval shrank from double digits in 2023 to near parity in 2024. China also continues to lead in AI publications and patents. (View Highlight)
  • A Jump in Problematic AI According to one index tracking AI harm, the AI Incidents Database, the number of AI-related incidents rose to 233 in 2024—a record high and a 56.4% increase over 2023. Among the incidents reported were deepfake intimate images and chatbots allegedly implicated in a teenager’s suicide. While this isn’t comprehensive, it does show a staggering increase in issues. (View Highlight)
  • The Rise of More Useful Agents AI agents show early promise. The launch of RE-Bench in 2024 introduced a rigorous benchmark for evaluating complex tasks for AI agents. In short time-horizon settings (two hours), top AI systems score four times higher than human experts, but when given more time to do a task, humans perform better than AI—outscoring it 2-to-1 at 32 hours. Still, AI agents already match human expertise in select tasks, such as writing specific types of code, while delivering results faster. (View Highlight)
  • Sky-High AI Investment The U.S. widened its commanding lead in global AI investment. U.S. private AI investment hit 9.3 billion and 24 times the UK’s 25.5 billion, up from a $21.1 billion gap in 2023. (View Highlight)
  • AI Goes Corporate Businesses are turning to AI. In 2024, the proportion of survey respondents reporting AI use by their organizations jumped to 78% from 55% in 2023. Similarly, the number of respondents who reported using generative AI in at least one business function more than doubled—from 33% in 2023 to 71% last year. (View Highlight)
  • Health AI Floods the FDA The number of FDA-approved, AI-enabled medical devices skyrocketed. The FDA authorized its first AI-enabled medical device in 1995. By 2015, only six such devices had been approved, but the number spiked to 223 by 2023. (View Highlight)