Pelayo Arbués

Recent Notes

Why Software Engineers Should Learn a Bit of Data Science
Apr 01, 2025
A recommender beast
Feb 05, 2025
The next generation of weak learners
Jan 28, 2025

See 89 more →

❯

Literature Notes

❯

❯

Direct Preference Optimization: Your language model is secretly a reward model

Direct Preference Optimization: Your language model is secretly a reward model

Apr 16, 20251 min read

articles
literature-note

rw-book-cover

Metadata

Author: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn
Full Title: Direct Preference Optimization: Your language model is secretly a reward model
URL: https://readwise.io/reader/document_raw_content/58089842

Highlights

Direct Preference Optimization: Your Language Model is Secretly a Reward Model (View Highlight)

Graph View

Metadata
Highlights

Now Reading

![CDATA[Not Boring by Packy McCormick]]>
Apr 16, 2025

See 1293 more →

Created with Quartz, © 2025

Bluesky
Linkedin
Mastodon
Twitter
Unsplash
GitHub
RSS