We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B. (View Highlight)
Since then, Evol-Instruct and Instruction&Process Supervised Reinforcement Learning (RLEIF) have become fundamental technologies for GenAI community. Recently, we have further optimized our methods and data quality, resulting in significant performance improvements, the outcome is WizardLM-2. (View Highlight)
WizardLM-2 8x22B is our most advanced model, and the best opensource LLM in our internal evaluation on highly complex tasks. WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size. WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models. (View Highlight)
As the natural world’s human-generated data becomes increasingly exhausted through LLM training, we believe that: the data carefully created by AI and the model step-by-step supervised by AI will be the sole path towards more powerful AI. (View Highlight)
Human Preferences Evaluation
We carefully collected a complex and challenging set consisting of real-world instructions, which includes main requirements of humanity, such as writing, coding, math, reasoning, agent, and multilingual. We perform a blind pairwise comparison between WizardLM-2 and baselines. To each annotator, responses from all models are presented, which are randomly shuffled to hide their sources. We report the win:loss rate without tie: (View Highlight)
The model weights of WizardLM-2 8x22B and WizardLM-2 7B are shared on Huggingface, and WizardLM-2 70B and the demo of all the models will be available in the coming days. Please use the same system prompts strictly with us to guarantee the generation quality. (View Highlight)