rw-book-cover

Metadata

Highlights

  • A giant dataset of YouTube subtitles has, per a new investigation, been used to train countless AI models without the permission of the tens of thousands of creators whose work was scraped. (View Highlight)
  • As Wired reports with the help of the data-driven Proof News project, a dataset known as “YouTube Subtitles” has been used by everyone from Apple and Anthropic to Nvidia and Salesforce to train AI models since it was released in 2020. (View Highlight)