Metadata
- Author: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn
- Full Title: Direct Preference Optimization: Your language model is secretly a reward model
- URL: https://readwise.io/reader/document_raw_content/58089842
Highlights
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model (View Highlight)