• /
  • /

How to Choose a Model on Hugging Face: 5-Step Guide

Using Hugging Face is a great way to save time and resources when creating NLP applications without starting from scratch. However, it can be time-consuming to identify the most suitable pre-trained model for your specific use case. Even if you have found a model that fits your use case, you will still need to identify the appropriate Hugging Face datasets to assess the model's performance or for fine-tuning the model you found.

In this blog, our Head of Innovation at Nurdle AI, Nina Lopatina, will guide you through selecting Hugging Face models efficiently. To demonstrate the process, we’ll use an example of looking for a sentiment analysis classifier for a financial services brand. The process will help you save time and resources, enabling you to build advanced NLP applications easily.
Nina Lopatina
Reading time: 15 min
Get the most out of Hugging Face with our 5-step guide to model selection. Improve your data analysis today.
How to efficiently identify models in Hugging Face for your data science workflow.

In this blog post

5 Steps To Choosing Hugging Face Models

Step 1: Source a sentiment analysis model.

The first step is to click on the Models tab from the Hugging Face home page. As of this writing, there are 414,079 Hugging Face models to choose from. We’ll enter “sentiment analysis” in the search bar to get started.
Entering sentiment analysis whittles the Hugging Face model hub down to just 785, which is still a lot to assess.
You can further narrow your search by filtering by the type of model you are looking for. Sentiment analysis is a text classification model, which you can select under the Natural Language Processing in the left-side navigation bar.
Next, select a model architecture you are familiar with, such as the popular architecture Bidirectional Encoder Representation From Transformers (BERT).
Expert Tip:

Select an architecture that you are familiar with and that you know performs well for your task, such as BERT. Since thousands of modifications are available, here are a few ways to narrow your search for a BERT model. If you are looking for a more general model, you can narrow the options by checking the number of downloads. The higher the number of downloads, the more popular it is and, therefore more likely to work out of the box.

On the other hand, if you are looking for a more specialized model, the number of downloads is less critical. Instead, you can look for models according to language, for example, if you need an Indonesian model, or according to industry, if you need a model for a specific domain such as finance.
The first listing is for Distilroberta and has 6.52 million downloads, which indicates that it performs well. It was also trained on financial news pertinent to our example.
The names of the models are as clever as they are descriptive. For example, distilroberta is a BERT model. It is designed to understand and generate human language by considering the context of words and sentences from both directions (left to right and right to left). RoBERTa is a more robust pretraining of the original BERT model (thus the “ro”). Knowledge distillation is a common technique used to reduce model size while maintaining model performance, for example, DistilBERT, which is distilled during pre-training and is 40% smaller than BERT. This model combines the robustly optimized pretraining approach of RoBERTa and pretraining distillation of DistilBERT for a smaller, more robust model called DistilRoBERTa. We can also gather from the title that the base DistilRoBERTa was fine-tuned for sentiment analysis on financial news.

Some of the models in the search results, such as heBERT for Hebrew text, Roberta-base-Indonesian for Indonesian text, and CAMeLBERT for Arabic text, have been trained in specific languages, which you can eliminate if your data isn’t in those languages. However, note that these models are typically more performant than multilingual ones in that specific language.

Clicking on a model name leads to a model card containing documentation about its performance characteristics, including how it was trained, its reported accuracy, the training dataset and training hyperparameters, and framework versions.
At Nurdle, we utilize a kernel of real-world data to generate “lookalike” synthetic datasets for AI. Nurdles synthetic datasets perform at 92% accuracy of real human data at just 5% of the cost. You get highly relevant training data for your AI projects without incurring high costs.

Nurdle is designed to replace the use of actual human-generated unstructured text data. The cost of labeling 100,000 rows of real data from one of the largest data labeling providers is about $10,000 (and you have to deal with all of the privacy and regulatory risks of storing, transferring, and using real data). Nurdle produces synthetic datasets for AI that are nearly identical, pre-labeled, and with no regulatory or privacy vulnerabilities for about $500.

To demonstrate this, Nurdle conducted a double-blind case study to compare our synthetic training data against our competitors, Gretel and Mostly AI - as well as human-generated and human-labeled (using Scale) data. We were looking to see which provider produced the most accurate synthetic dataset - and how much it costs to produce usable synthetic training data sets from each synthetic data provider.

We began by analyzing a sample dataset comprising 7,500 rows of genuine data, which included 5,500 rows labeled for insult. Each provider generated the same quantity of synthetic data, and we tested each dataset to evaluate its accuracy. The results were clear: Nurdle was the top performer, producing highly accurate synthetic data at a fraction of the cost of human-labeled data.

Step 2: After identifying a few Hugging Face models, test them out quantitatively.

Once you’ve selected a few models that seem appropriate to your use case, you can test their performance in a few ways. To get a quick directional sense of performance, you can put a few examples directly in the same Hugging Face site as the model card if they have an inference API. To test more thoroughly, load the model in your codebase and test the performance on a set of positive and negative text examples that are relevant and irrelevant to your brand so that you can assess how well the model performs for your use case.
Expert Tip:

Choosing domain-specific text examples when testing a model is crucial. Also, remember that not all models can accurately detect negated or ambiguous text. Therefore, it is necessary to provide examples from various categories, such as positive, neutral, and negative sentiment, negation, and other edge cases. This approach will assist in understanding how well the identified Hugging Face model will perform on a diverse range of text examples in your specific domain.
Because our example involved a financial services company, we begin by entering a negative sentence in the financial domain: “Apple is facing some headwinds this week.” We can see that the model performs very well when the sentence is within the financial domain: 99.6% prediction probability of negative sentiment, which is the correct classification.
It performs markedly worse in identifying a negative sentiment when the sentence is unrelated to finance, like "This movie was utter garbage." This came back as a 99.8% prediction probability for neutral sentiment, indicating that this model is highly specialized for the financial domain.
Based on a quick glance at 2 examples, this model warrants further consideration for financial sentiment analysis but would not work as a general-purpose sentiment analysis classifier. We would next evaluate this on a full evaluation dataset after this preliminary assessment was promising.

We also entered these same examples into another model: bertweet-base-sentiment-analysis, another BERT-based model, but fine-tuned on tweets rather than financial text. Our financial example came back with an incorrect prediction of positive sentiment. Given our knowledge about the poor fit of the training data to our use case and the model’s high confidence (88.8%) in an incorrect prediction, I would not further consider this model for financial sentiment analysis, as we had expected.
It does, however, better discriminate intent for a more colloquial and non-financial statement:
If we were looking for a more general model, we would consider a further evaluation of this model.
Expert Tip:

Finding a Hugging Face model with inference API will allow you to easily deploy a pre-trained model into your production environment via an Inference endpoint. This provides autoscaling infrastructure managed by Hugging Face.

Step 3: Check the license.

All kinds of factors come into play when selecting a model, and licensing is one of them. Not all models are licensed for commercial use!

Check out the License section of the model page before using it.
The Distilberta model has an Apache 2.0 license that allows for commercial use.
Expert Tip:

Finding a model for commercial use on Hugging Face can be tricky, but don’t worry too much if you can’t find one. If the model is exactly what you need for sentiment analysis, but it’s not licensed for commercial use, you can use the documentation and relevant publications to guide your pre-training or fine-tuning LLM process. The model you found can still be used for testing and benchmarking.

Step 4: Start using the Hugging Face model.

After further evaluation of distilroberta-finetuned-financial-news-sentiment-analysis on a more comprehensive test set, we have found that the accuracy of our dataset reaches our desired performance target. Since we have a production deployment environment set up, we are opting to deploy this model there rather than using Hugging Face’s inference API endpoint. To load the model in your own coding environment, click on the </> Use in Transformers tab in the upper right-hand corner of the model page.

A dialog box will pop up, providing everything you need to enter into your code to load this specific model.
You can use the pre-configured pipeline or directly load the model and tokenizer.

Before using either option, it's essential to have the transformers library installed, which can be done in your shell environment and accessed in either Python or a Jupyter Notebook.
Expert Tip:

Be sure to use the same tokenizer used in the model's training phase to tokenize your data before you use it to do additional fine-tuning or for inference. Consistency in tokenization is vital because the model has been trained to understand and work with tokenized data in a specific way. Using a different tokenizer during inference will not yield viable results.

Step 5: Fine-tune the Hugging Face model.

Hugging Face allows you to fine-tune the model directly from the Hugging Face AutoTrain product feature. AutoTrain is a no-code tool for training a model you found in your model search using either a Hugging Face dataset or a custom dataset. This feature allows non-experts to train highly performant models and get them deployed at scale quickly and efficiently.
In conclusion, harnessing the power of the Hugging Face model hub for your data science workflow can be a game-changer in developing advanced NLP applications efficiently. The four-step process outlined here — sourcing models, quantitative testing, verifying licensing, and implementation — provides a systematic approach to selecting the right Hugging Face model for your specific needs. With these steps, guided by Nina Lopatina's expertise, navigating the vast repository of models becomes more streamlined, allowing you to save time and resources while building tailored NLP solutions. Remember, carefully selecting and evaluating models aligned with your domain can significantly impact the success of your projects. So dive into Hugging Face equipped with these insights to expedite your NLP endeavors and propel your data science workflows forward.

About Nurdle

When you are ready to fine-tune the model you choose in Hugging Face, Nurdle AI can provide the training data for you. Nurdle AI stands out from Hugging Face datasets due to its focus on tailoring synthetic datasets to meet the unique needs of companies. Nurdle AI excels in customization, allowing for the generation of synthetic datasets that closely replicate the intricacies of specific real-world data, including patterns, distributions, and complexities. Its emphasis on privacy preservation ensures the creation of datasets that maintain statistical accuracy while safeguarding sensitive information, a feature particularly relevant for companies navigating privacy regulations. Moreover, Nurdle AI offers scalability and flexibility in dataset generation, accommodating diverse sizes and complexities required for training and validating various AI models. Additionally, its capabilities in bias mitigation and the creation of custom scenarios provide a level of control that may surpass the offerings available through datasets on Hugging Face, making Nurdle AI a valuable tool for companies seeking tailored, privacy-conscious synthetic datasets.
Follow us on social