High-quality training data is critical in developing reliable models. While LLMs help reduce the amount of human inputs to train an ML model, adding human feedback can significantly increase the quality of the model. One tool that simplifies this process is the distilabel library, which leverages LLMs to supercharge labeling workflows. For text classification, the ArgillaLabeller task uses an LLM to label datasets hosted on Argilla, a modern, open-source, data-centric tool to improve AI datasets. This integration combines the best of both worlds: the automatic labeling capabilities of LLMs and the high-quality annotations from human experts. (View Highlight)
The first step is to configure your Argilla dataset. This means you need to define the fields that will contain the data to be annotated, the labels, and the guidelines. You have full flexibility to customize these elements for your use case so they align with your project goals. (View Highlight)
In the next step, distilabel will use this dataset and its configuration to auto-label it using an LLM. This means that, under the hood, distilabel will retrieve Argilla’s data and format it into a prompt template to guide the LLM in understanding the labeling task. You can find the prompt template in the Appendix for reference. (View Highlight)
The next step is to run distilabel to start auto-labeling, as shown in the video below: (View Highlight)
To label records with an LLM, you need to set up ArgillaLabeller with the sentiment classification dataset. Optionally, you can add some example records for a few-shot setting. (View Highlight)
Once you run your ArgillaLabeller with your dataset, it will automatically label it. The suggested labels will be available in the Argilla UI for human review. (View Highlight)
This approach simplifies the labeling process, allowing you to automatically label data in real-time while refining it with human feedback. This method is powerful and versatile, providing good performance right out of the box, especially for straightforward workflows. By automating much of the process, you can save time and effort, avoiding the need for days and days of manual labeling. However, at the same time, it ensures that humans are still in the loop, making it an efficient way to build high-quality datasets for your project. (View Highlight)
The final step is to train our specialized model with the annotated data. This data has been auto-labeled by an LLM and then improved with human feedback via the Argilla UI. The goal is to train a small, specialized model that’s optimized for your use case, without relying on an LLM for inference. In this example, we will be using SetFit, a powerful text classification library. (View Highlight)
With just a few lines of code and the annotated data from Argilla, you can easily train your classifier. With this approach, you will have successfully trained a working classifier using domain-specific high-quality data in an efficient way. (View Highlight)
In the previous sections, we demonstrated how to use an LLM to label a dataset and get enough annotated samples to feed and train a smaller model. However, there may still be questions regarding the performance and accuracy of each model. To assess this, we conducted an experiment comparing both approaches. Specifically, we used the argilla/pc-components-reviews dataset and compared the performance of both models using a different number of samples. (View Highlight)
First, we used an LLM (LlaMA 3.1 8b) to classify product reviews as positive, negative, or neutral sentiment. The LLM had a description of the task from the annotation guidelines and an incrementing set of examples per class from 0 to 6. Then, we trained a classifier with the SetFit library using the bge-micro-v2 model. This approach is not compatible with prompts, but we used an incrementing set of samples per class to train the classifier, again from 0 to 6. In both cases, we measured the accuracy of the remaining 122 samples from the pc-component-review dataset. (View Highlight)
As shown below, the outcomes of our experiment suggest that the SetFit model outperforms the zero-shot LLama-3.1-8B, especially for few-shot classification. This highlights the value of LLMs for initial data annotation, where they generate high-quality synthetic labels. Once the initial dataset is annotated, smaller models like SetFit can further refine and learn from this data, providing improved performance. This approach accelerates model development and allows a more efficient deployment of classifiers in resource-constrained environments. (View Highlight)