Pelayo Arbués

Recent Notes

Why Software Engineers Should Learn a Bit of Data Science
Apr 01, 2025
A recommender beast
Feb 05, 2025
The next generation of weak learners
Jan 28, 2025

See 89 more →

❯

Literature Notes

❯

❯

7 Ways to Speed Up Inference of Your Hosted LLMs

7 Ways to Speed Up Inference of Your Hosted LLMs

Apr 16, 20251 min read

articles
literature-note

rw-book-cover

Metadata

Author: Sergei Savvov
Full Title: 7 Ways to Speed Up Inference of Your Hosted LLMs
URL: https://slgero.medium.com/speed-up-llm-inference-83653aa24c47

Highlights

tldr; techniques to speed up inference of LLMs to increase token generation speed and reduce memory consumption: Mixed-Precision, Bfloat16, Quantization, Fine-tuning with Adapters, Pruning, Continuous Batching and Multiple GPUs. (View Highlight)

Graph View

Metadata
Highlights

Now Reading

![CDATA[Not Boring by Packy McCormick]]>
Apr 16, 2025

See 1293 more →

Created with Quartz, © 2025

Bluesky
Linkedin
Mastodon
Twitter
Unsplash
GitHub
RSS