Gemini 1.5 Flash-8B is “a smaller and faster variant of 1.5 Flash” - and is now released to production, at half the price of the 1.5 Flash model.
It’s really, really cheap:
• 0.0375per1milliontokensonprompts<128K•0.15 per 1 million tokens on prompts >128K
• 0.01per1milliontokensoncachedprompts<128KIbelieveimagesarestillchargedataflatrateof258tokens,whichIthinkmeansasinglenon−cachedimagewithFlashshouldcost0.00097cents−anumbersotinyI′mdoubtingifIgotthecalculationright.OpenAI′scheapestmodelremainsGPT−4omini,at0.150/1M input - though that drops to half of that for reused prompt prefixes thanks to their new prompt caching feature (and by half again if you use batches - Gemini also offer half-off for batched requests).
Anthropic’s cheapest model is still Claude 3 Haiku at 0.25/M,thoughthatdropsto0.03/M for cached tokens (if you configure them correctly). (View Highlight)