Two years is a very long time in the world of AI. And during that very long time, the consensus about the true nemesis of OpenAI has shifted several times. First, we all believed Anthropic was the true rival of OpenAI. It is understandable. Anthropic was founded by former members of OpenAI’s technical or leadership teams, reportedly over a disagreement about AI safety. Anthropic was thus the AI safe alternative to OpenAI. Then, with the release of Claude Sonnet-3.5, it became the company with the best coding model. (View Highlight)
After Anthropic, Mistral briefly emerged as a strong contender due to its focus on open-source models, which contrasted with the proprietary approach of OpenAI and Anthropic. However, Mistral’s impact diminished as they shifted their strategy to more proprietary models, likely for financial reasons. Anthropic came back on top as the main OpenAI’s alternative soon after, with the release of Claude Sonnet-3.5 and artifacts. It also captured significant mindshare by pushing innovative concepts like **computer use**and the model control protocol. At the same time, OpenAI kept its lead with the release of powerful reasoning models like o1. (View Highlight)
DeepSeek has rapidly risen to the forefront of AI with the launch of DeepSeek-V3, an open-source model that outperforms GPT-4o and Claude 3.5 Sonnet. This achievement is even more remarkable considering it was developed at a fraction of the cost (reportedly less than $6 million). (View Highlight)
Let’s say you want to finetune a coding LLM so it excels on questions related to your proprietary codebase. The first step would be to get all the code documentation available. This includes API references, internal developer guides, inline code comments, and any architectural diagrams that provide insights into the system. (View Highlight)
Next, you should collect a dataset of relevant code snippets, bug reports, and past developer interactions that can help the model understand real-world usage patterns. Cleaning and structuring this data is essential to remove redundant or misleading information. (View Highlight)
Once the data is prepared, you can fine-tune an open-weight model like DeepSeek-Coder-V2 or even V3 using techniques like LoRA (Low-Rank Adaptation) to efficiently adapt the base model to your specific coding style and architecture. Hosting the fine-tuned model on a cloud platform like Nebius AI Studio allows for easy deployment and integration into your development workflow. (View Highlight)
DeepSeek R1 is MIT licensed and it is truly “open AI”, contrary to OpenAI that is open in name only. (View Highlight)