BERT is a deep learning based natural language processing model that is capable of capturing complex semantic information using multi-headed attention and bidirectional training. BERT is also capable of being fine-tuned for specific natural language processing tasks. Thus, by using BERT to solve a text classification problem within the company in question, it will be possible to learn the company’s specific jargon. For example, if the company uses specific technical terms or acronyms, the model can be trained to understand and use these terms in its predictions. This can help improve the accuracy of the model by using data that is more relevant to the business.
More specifically, in our case we will use the bert_uncased version in its classification version. It has a specific classification architecture that will allows us to directly fine-tune the model for a multi-class problem. (View Highlight)
A first important factor to consider in preprocessing labels is the occurrence of the different labels. Often, the labels can be highly unbalanced, meaning that some labels appear much more frequently than others. This can cause problems for the model to learn, as rare labels may not have enough data for the model to find meaningful patterns.
A second factor is the complexity of the problem. When dealing with a large number of labels, the computational complexity of the model can increase significantly. (View Highlight)
As mentioned earlier, achieving optimal training performance may require further balancing of the training dataset. To do so, the number of incidents in the training dataset is adjusted by equalising the number of incidents associated with each team. More precisely, we will take as a reference the number of incidents of the team with the fewest. (View Highlight)