The WizardLM paper proposes a new method, Evol-Instruct, to synthetically create a dataset with open-domain instructions of varying complexity using gpt-3.5-turbo. The resulting dataset, combined with the original, was used to fine-tune LLaMa, leading to the creation of WizardLM. This model surpasses ChatGPT in both human and automatic evaluations, demonstrating more than 90% of ChatGPT’s capabilities in 17 out of 29 skills. (View Highlight)
The WizardLM paper proposes a new method, Evol-Instruct, to synthetically create a dataset with open-domain instructions of varying complexity using gpt-3.5-turbo. The resulting dataset, combined with the original, was used to fine-tune LLaMa, leading to the creation of WizardLM. This model surpasses ChatGPT in both human and automatic evaluations, demonstrating more than 90% of ChatGPT’s capabilities in 17 out of 29 skills. (View Highlight)
In this tutorial, we will only focus on the Evol-Instruct approach to create a more complex dataset. (View Highlight)
Intruction Evolving: Use gpt-3.5-turbo with predefined prompts to generate the evolved instructions. These prompts can be of two types: in-depth evolving (includes adding constraints, deepening, concretizing, increasing reasoning, and complicating the input) and in-breadth evolving (includes mutation). The complicating prompt is the only one not applied as it needs in-context examples. Then, only one of the remaining five is selected randomly to be applied to the input instruction (View Highlight)
Intruction Evolving: Use gpt-3.5-turbo with predefined prompts to generate the evolved instructions. These prompts can be of two types: in-depth evolving (includes adding constraints, deepening, concretizing, increasing reasoning, and complicating the input) and in-breadth evolving (includes mutation). The complicating prompt is the only one not applied as it needs in-context examples. Then, only one of the remaining five is selected randomly to be applied to the input instruction. You can check the original code here.
Elimination Evolving
The instruction evolving step may fail, so the new instructions are filtered according to the following criteria:
The evolved instruction does not provide any information gain. Automatically evaluated with ChatGPT.
The evolved instruction contains “sorry” and is less than 80 words.
The evolved instruction only contains punctuation and stop words.
The evolved instruction copies words from the evolving prompt.
If the evolved instruction passes the previous criteria, it is added to the pool of new instructions and also will be used as input for the next iteration. If not, it is dropped and the original instruction is the one used for the next iteration.
Once, the evolved instructions are generated, they use the same LLM to generate the corresponding responses. Finally, the resulting dataset is the combination of the original and the new instructions generated in each epoch. (View Highlight)
On the other hand, the Deita paper proposes more strategies to select the best data for alignment. While using the Evol-Instruct approach, but without the breadth evolving step, what they called Evol-Complexity. They also applied the Evol-quality and Data selection strategies.
• The Evol-quality is similar to Evol-Complexity, although it uses a different prompt, which is focused on improving the quality of the responses by enhancing helpfulness, augmenting relevance, enriching depth, fostering creativity, and supplying additional details, to generate new pairs.
• The Data Selection strategy filters the new instructions using embeddings and cosine similarity to the original instructions to select the best and most diverse ones. (View Highlight)
The first step is to prepare the initial dataset that will be used for the evolution process. Following the same idea as shown in an example from the paper, we will use the well-known alpaca dataset available in HuggingFace. For the sake of this tutorial’s example, we will use 5 samples.
Good to mention that other datasets like the distilabel-intel-orca-dpo-pairs, a “distilabeled” version of orca_dpo_pairs for preference tuning with 12.9K samples, were also applied as the seed dataset. However, the instructions were already too complex, so the evolution process generated a small amount of instructions that were of poor-quality or with hallucinations. (View Highlight)
The Evol-Complexity approach¶
For our case, we will need to set two different LLMs with their corresponding tasks: one for the instruction evolving and another for the elimination evolving step 1. (View Highlight)
For our case, we will need to set two different LLMs with their corresponding tasks: one for the instruction evolving and another for the elimination evolving step 1.
Instruction Evolving LLM¶
The next step is to define the LLM that will be used to generate the evolved instructions. We will use gpt-3.5-turbo as the language model, and the task EvolComplexityTask, also we will set some parameters (Section 4.3 from WizardLM). Take into account that the EvolComplexity will perform the random selection of the evolving prompt and the filtering of the evolved instructions up the first step from the elimination evolving related to equal prompts. (View Highlight)
Instruction Evolving LLM¶
The next step is to define the LLM that will be used to generate the evolved instructions. We will use gpt-3.5-turbo as the language model, and the task EvolComplexityTask, also we will set some parameters (Section 4.3 from WizardLM). Take into account that the EvolComplexity will perform the random selection of the evolving prompt and the filtering of the evolved instructions up the first step from the elimination evolving related to equal prompts (View Highlight)
Elimination Evolving LLM¶
As part of the elimination step, it was stated to ask ChatGPT if the original prompt and the evolved one from the current epoch are equal. In order to do so, we will need to define a LLM with the corresponding task. As the task does not exist, we will customize one based on TextGenerationTask from distilabel indicating how to generate the prompt and parse the output. (View Highlight)