Thunder increases PyTorch Large Language Model (LLM) training speed by 40%, evident in tasks like Llama 2 7B model training. (View Highlight)
Apply Thunder to your PyTorch models by calling thunder.jit(). This enables enhanced performance for multi-GPU environments using Distributed Data Parallel (DDP) and Fully Sharded Data Parallel (FSDP). (View Highlight)
Thunder uses hardware executors like nvFuser, torch.compile, cuDNN, and TransformerEngine FP8, improving both single and multi-accelerator performance. It integrates seamlessly with PyTorch’s standard operations and autograd. (View Highlight)