Working with Pre-Trained Deep Learning Models for NLP

deep learninig

Deep learning is a subset of artificial intelligence that uses artificial neural networks to model and imitate the functioning of the human brain. This article takes a look at how to work with pre-trained deep learning models for natural language processing (NLP).

Deep learning powers numerous artificial intelligence apps and services that automate analytical and physical processes. It is currently used in a very wide range of applications in the government, corporate and social sectors.

A few key applications of deep learning in real-world domains are:

  • Real-time computer vision and image analytics
  • Virtual assistants
  • Automated manufacturing
  • Speech recognition (vocal artificial intelligence)
  • Data science and engineering
  • Entertainment and musical notations
  • Stock trading and financial data analytics
  • Shopping patterns analysis in e-commerce
  • Sentiment analysis on social media
  • Customer relationship management systems
  • Advertising and promotional activities
  • Autonomous vehicles, self-driving cars and drones
  • Natural language processing (NLP)
  • Fraud detection and cyber security
  • Emotional intelligence
  • Healthcare and medical diagnosis
  • Investment modelling

    Key applications of deep learning
    Figure 1: Key applications of deep learning

Classification of models in deep learning

A number of deep learning models are dedicated for a specific application area. In each field of research and application, a particular deep learning model is implemented to enable a higher degree of effectiveness, performance and accuracy (see Table 1).

Table 1: Deep learning models and use cases

Deep learning model Use cases
Classic neural networks (multilayer perceptrons) Tabular data analysis, Classification and regression based problem solving
Convolutional neural networks Image data sets, Optical character recognition (OCR) intelligence
Recurrent neural networks Image classification, Image captioning, Sentiment analysis, Video classification
Self-organising maps (SOM) Dimensionality reduction, Music, Video
Auto encoders Huge data sets, Recommendation engines, Dimensionality reduction
Boltzmann machines Monitoring and surveillance based applications

Pre-trained models for multiple research domains

Pre-trained models are used to implement deep learning rapidly with high accuracy (see Table 2). These models have weights, which can be imported by researchers and scientists to deploy the deep learning application quickly in a particular domain without modelling from scratch.

Object detection and image analytics
  • Xception
  • VGG16
  • VGG19
  • ResNet
  • ResNetV2
  • ResNeXt
  • InceptionV3
  • InceptionResNetV2
  • MobileNet
  • MobileNetV2
  • DenseNet
  • NASNet
  • YOLO
Natural language processing
  • OpenAI GPT-3
  • Google BERT
  • Google ALBERT
  • Google Transformer-XL
  • ULMFiT
  • Facebook RoBERTa
  • Microsoft CodeBERT
  • ELMo
  • XLNet
Audio and speech
  • Wavenet
  • Lip Reading
  • MusicGenreClassification
  • Audioset
  • DeepSpeech
  • Waveglow
  • Loop
  • TTS
  • MXNET-Audio

The key advantages of using pre-trained models based libraries are:

  • Inclusion of pre-trained weights with NLP architectures
  • Inclusion of fine-tuning with pre-processing
  • Easy-to-use scripts and APIs
  • Multilingual support with international and regional languages
  • Compatibility with graphics processing unit (GPU)
  • Pre-programmed algorithms from leading companies

Installation and working with pre-trained NLP based models

HuggingFace ( is one of the key platforms that provide pre-trained models for natural language processing (NLP). It is cloud based and can be integrated with Google Colab for running scripts.

Online platform for pre-trained models
Figure 2: Online platform for pre-trained models

To install the pre-trained NLP based models in Google Colab, execute the following:

! pip install pytorch-transformers
! pip install transformers 
! pip install sentencepiece

Prediction of the next sequence in Google Search

When we write some text in Google Search, the next sequence is suggested by the back-end library of Google. For example, if we want to predict the next word after the sentence ‘What is the name of the Indian?’, the following transformer can be used:

Prediction of the next sequence in Google Search
Figure 3: Prediction of the next sequence in Google Search
import torch
from pytorch_transformers import GPT2Tokenizer, GPT2LMHeadModel
mytokenizer = GPT2Tokenizer.from_pretrained(‘gpt2’)
# Encode a text inputs
text = “what is name of the Indian  “
indexed_tokens = mytokenizer.encode(text)
tokens_tensor = torch.tensor([indexed_tokens])
model = GPT2LMHeadModel.from_pretrained(‘gpt2’)
tokens_tensor =‘cuda’)‘cuda’)
with torch.no_grad():
    outs = model(tokens_tensor)
    preds = outs[0]
pred_index = torch.argmax(preds[0, -1, :]).item()
pred_text = tokenizer.decode(indexed_tokens + [pred_index])

The output from the execution of this code will be predicted depending on the following:

  • flag
  • parliament
  • others, depending upon the search

HuggingFace is providing pre-trained models for a very wide range of applications and being used by numerous corporate giants.

Key organisations using
Figure 4: Key organisations using

Prediction of a word when filling in the blanks

The classical case of filling in the blanks with real-time search can be solved using a pre-trained model of NLP.

Here is an example to predict the word that can be used in place of [MASK].

from transformers import pipeline
myprediction = pipeline(‘fill-mask’, model=’bert-base-uncased’)
myprediction (“This is a [MASK].”)

The output is:

 [{‘score’: 0.03235777094960213,
  ‘sequence’: ‘this is a dream.’,
  ‘token’: 3959,
  ‘token_str’: ‘dream’},
 {‘score’: 0.030467838048934937,
  ‘sequence’: ‘this is a mistake.’,
  ‘token’: 6707,
  ‘token_str’: ‘mistake’},
 {‘score’: 0.028352534398436546,
  ‘sequence’: ‘this is a test.’,
  ‘token’: 3231,
  ‘token_str’: ‘test’},
 {‘score’: 0.025175178423523903,
  ‘sequence’: ‘this is a game.’,
  ‘token’: 2208,
  ‘token_str’: ‘game’},
 {‘score’: 0.024909017607569695,
  ‘sequence’: ‘this is a lie.’,
  ‘token’: 4682,
  ‘token_str’: ‘lie’}]

from transformers import pipeline
unmasker = pipeline(‘fill-mask’, model=’bert-base-uncased’)
myprediction (“He is a [MASK].”)

The output is:

[{‘score’: 0.17371997237205505,
  ‘sequence’: ‘he is a christian.’,
  ‘token’: 3017,
  ‘token_str’: ‘christian’},
 {‘score’: 0.08878538012504578,
  ‘sequence’: ‘he is a democrat.’,
  ‘token’: 7672,
  ‘token_str’: ‘democrat’},
 {‘score’: 0.06659623980522156,
  ‘sequence’: ‘he is a republican.’,
  ‘token’: 3951,
  ‘token_str’: ‘republican’},
 {‘score’: 0.03911091387271881,
  ‘sequence’: ‘he is a vegetarian.’,
  ‘token’: 23566,
  ‘token_str’: ‘vegetarian’},
 {‘score’: 0.036758508533239365,
  ‘sequence’: ‘he is a catholic.’,
  ‘token’: 3234,
  ‘token_str’: ‘catholic’}]

Prediction of a word using a pre-trained NLP model

We can predict a word using a pre-trained NLP model as follows:

import torch
from pytorch_transformers import BertTokenizer, BertModel, BertForMaskedLM
# Loading of Tokenizer
mytokenizer = BertTokenizer.from_pretrained(‘bert-base-uncased’)
# Input Tokenization
text = “[CLS] Who was Puppet Expert ? [SEP] Puppet Expert was a puppeteer [SEP]”
t_Text = mytokenizer.tokenize(text)
# Masking of Token
masked_index = 8
t_Text[masked_index] = ‘[MASK]’
assert t_Text == [‘[CLS]’, ‘who’, ‘was’, ‘Puppet’, ‘Expert’, ‘?’, ‘[SEP]’, ‘Puppet’, ‘[MASK]’, ‘was’, ‘a’, ‘puppet’, ‘##eer’, ‘[SEP]’]
# Conversion of Token
i_token = tokenizer.convert_tokens_to_ids(t_Text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Conversion of Inputs to Tensors
t_tensor = torch.tensor([i_token])
segments_tensors = torch.tensor([segments_ids])
# Loading of Weights from Pre-Training Model
model = BertForMaskedLM.from_pretrained(‘bert-base-uncased’)
# Invocation of GPU
t_tensor =‘cuda’)
s_tensors =‘cuda’)‘cuda’)

# Prediction of tokens
with torch.no_grad():
    outputs = model(t_tensor, token_type_ids=s)
    predictions = outputs[0]
pred_index = torch.argmax(predictions[0, masked_index]).item()
pred_token = tokenizer.convert_ids_to_tokens([pred_index])[0]
assert pred_token == ‘Expert’
print(‘Pred token is:’,pred_token)

The output is:

100%|██████████| 231508/231508 [00:00<00:00, 703114.82B/s]
100%|██████████| 433/433 [00:00<00:00, 81193.38B/s]
100%|██████████| 440473133/440473133 [00:20<00:00, 21132461.45B/s]
Predicted token is: Expert

Research scholars, academicians and practitioners working in the domain of speech and natural language processing can use free and open source pre-trained models for their research work as these enable a high degree of accuracy and performance with real-world data sets. The models available in such cloud based platforms are quite effective for dynamic applications including audio forensics, speech recognition, speech-to-text translation, language analytics, and many others.


Please enter your comment!
Please enter your name here