rw-book-cover

Metadata

  • Author: AlphaSignal
  • Full Title: Ⓜ️ New Mistral Large Beats Every LLM - GPT4

Highlights

  • After several weeks of speculations about a new model, Mistral AI has officially released Large - its latest and most performant model- along with Le Chat, a beta of their chat UI. (View Highlight)
  • Mistral Large boasts a performance of 81.2% on MMLU (measuring massive multitask language understanding), beating Claude 2, Gemini Pro and Llama-2-70B. Large is particularly good at common sense and reasoning, with a 94.2% accuracy on the Arc Challenge (5 shot). (View Highlight)
  • Mistral Small was also updated on the API to a faster and more performant model than Mixtral 8x7B. (View Highlight)
  • Training on English, French, Spanish and Italian datasets for native multilingual capabilities. (View Highlight)
  • The JSON format mode forces the language model output to be valid JSON. This functionality enables the extraction of information in a structured format that can be easily used by developers. (View Highlight)
  • Phind 70B not only beats GPT-4 at coding, but it also runs 4x faster. The Phind team, which released the model yesterday, claims that it offers the best user experience thanks to its speed (80 token per second), performance (82.3% on HumanEval) and response style (less “lazy” than GPT-4, with detailed code examples). (View Highlight)
  • Google presented Genie, Generative Interactive Environments, an 11B parameter model capable of generating action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. Genie was trained in an unsupervised manner from unlabelled Internet gaming videos, without any action or text annotations. (View Highlight)
  • Google has temporarily disabled the people-image generation feature of its Gemini AI due to historical inaccuracies. Gemini, built on the Imagen 2 model, incorrectly depicted historical figures, leading to a broad discussion on diversity and historical context within AI outputs. These inaccuracies, such as misrepresenting the U.S. Founding Fathers with diverse racial backgrounds, have raised concerns over the AI’s handling of historical accuracy versus diversity initiatives. (View Highlight)
  • Google has acknowledged the issue, attributing it to the model’s oversensitivity and the complex nature of training data biases. The company plans to refine Gemini’s algorithm to improve context awareness and ensure a balance between diversity and historical accuracy. This incident highlights the ongoing struggle within AI development to achieve ethical representation and the necessity for continuous improvement and oversight in AI systems. (View Highlight)