7 Ways to Speed Up Inference of Your Hosted LLMs
Metadata
- Author: Sergei Savvov
- Full Title: 7 Ways to Speed Up Inference of Your Hosted LLMs
- URL: https://slgero.medium.com/speed-up-llm-inference-83653aa24c47
Highlights
- tldr; techniques to speed up inference of LLMs to increase token generation speed and reduce memory consumption: Mixed-Precision, Bfloat16, Quantization, Fine-tuning with Adapters, Pruning, Continuous Batching and Multiple GPUs. (View Highlight)