Pelayo Arbués

Recent Notes

Learning to Read the Maps AI Gives Us
Sep 02, 2025
AI Learning Paths for Software Engineers Without Becoming a Data Scientist
May 21, 2025
Power and Prediction
Apr 30, 2025

See 92 more →

❯

Literature Notes

❯

❯

Gemini 1.5 Flash 8b Is Now Production Ready

Gemini 1.5 Flash-8b Is Now Production Ready

Apr 16, 20251 min read

articles
literature-note

Metadata

Author: Simon Willison
Full Title: Gemini 1.5 Flash-8b Is Now Production Ready
URL: https://simonwillison.net/2024/Oct/3/gemini-15-flash-8b/#atom-everything

Highlights

Gemini 1.5 Flash-8B is “a smaller and faster variant of 1.5 Flash” - and is now released to production, at half the price of the 1.5 Flash model. It’s really, really cheap: • $0.0375 p er 1 mi ll i o n t o k e n so n p ro m pt s < 128 K •$ 0.15 per 1 million tokens on prompts >128K • $0.01 p er 1 mi ll i o n t o k e n so n c a c h e d p ro m pt s < 128 K I b e l i e v e ima g es a res t i ll c ha r g e d a t a f l a t r a t eo f 258 t o k e n s, w hi c h I t hinkm e an s a s in g l e n o n - c a c h e d ima g e w i t h Fl a s h s h o u l d cos t 0.00097 ce n t s - an u mb erso t in y I^{'} m d o u b t in g i f I g o tt h ec a l c u l a t i o n r i g h t . Op e n A I^{'} sc h e a p es t m o d e l re main s GPT - 4 o mini, a t$ 0.150/1M input - though that drops to half of that for reused prompt prefixes thanks to their new prompt caching feature (and by half again if you use batches - Gemini also offer half-off for batched requests). Anthropic’s cheapest model is still Claude 3 Haiku at $0.25/ M, t h o ug h t ha t d ro p s t o$ 0.03/M for cached tokens (if you configure them correctly). (View Highlight)

Graph View

Metadata
Highlights

Now Reading

Claude Can Now Create and Edit Files
Sep 11, 2025

See 1481 more →

Created with Quartz, © 2025

Bluesky
Linkedin
Mastodon
Twitter
Unsplash
GitHub
RSS