rw-book-cover

Metadata

Highlights

  • Gemini 1.5 Flash-8B is “a smaller and faster variant of 1.5 Flash” - and is now released to production, at half the price of the 1.5 Flash model. It’s really, really cheap: • 0.15 per 1 million tokens on prompts >128K • 0.150/1M input - though that drops to half of that for reused prompt prefixes thanks to their new prompt caching feature (and by half again if you use batches - Gemini also offer half-off for batched requests). Anthropic’s cheapest model is still Claude 3 Haiku at 0.03/M for cached tokens (if you configure them correctly). (View Highlight)