Transfer Learning: Leveraging Existing Knowledge to Enhance Your Models

March 20, 2023 Jaiganesh Nagidi

Transfer Learning is a popular technique used in deep learning where models trained for a particular task are reused as a base model or starting point for another model intended to perform another task.

In deep learning terminology, it’s called Pre-trained models.

The pre-trained models are used as the starting point for deep learning, computer vision and natural language processing tasks, given the vast computing and the limited time to build neural network models on these kinds of problems.

With the transfer learning approach, we can see enormous jumps in the model-building phase.

What is Transfer Learning ?

Click to Tweet

In this article, we will learn what transfer learning is, its importance, and its implementation in Python.

Introduction

Computer vision, natural language processing, and speech recognition are just a few of the fields that have been transformed by deep learning.

Deep neural networks must be trained on enormous quantities of labelled data, which can be expensive and time-consuming to acquire.

So, It would be good to have a mechanism to leverage the build model's learnings for other use cases.

That’s the idea behind the born of Transfer learning.

So in a more technical way of saying,

“Using previously learned models to complete new tasks is the goal of the deep learning technique known as transfer learning”.

What Is Transfer Learning?

Transfer learning in deep learning involves using pre-trained models as a starting point for training a new model on a related task.

A pre-trained model is a deep neural network trained on a large dataset for a specific task, such as image classification or object detection.

The pre-trained model has learned to extract useful features from the input data, and these features can be used to solve related tasks.

In transfer learning, the pre-trained model is typically used in one of two ways.

First, the pre-trained model can be used as a feature extractor, where the output of one or more layers of the pre-trained model is fed as input to a new model.

Alternatively, the pre-trained model can be fine-tuned, where the weights of some or all layers of the pre-trained model are adjusted during training to fit the new task better.

Why We Need Tranfer Learning?

Deep learning requires transfer learning because it can drastically reduce the amount of labelled data needed to train a new model. Using a pre-trained model allows the new model to start with a robust set of initial parameters and learn more quickly.

Rather than it was starting from scratch. The new model's generalisation ability can be enhanced by transfer learning, enabling it to perform well on brand-new datasets.

Transfer learning in deep learning is hugely beneficial for tasks requiring large amounts of labelled data, such as speech recognition or medical imaging.

Additionally, it can be utilised to boost a model's efficiency in areas with constrained computational capabilities. Such as edge computing or mobile devices.

Transfer Learning Workflow

As per the above flow chart, we can infer that the model 01 trained on data01 and model02 trained on data02.

But the model 02 uses the model training knowldge from model 01 along with data 02.

Difference Between Machine Learning and Transfer Learning

Traditional Machine Learning is a paradigm where a separate model is trained for each specific task. Each model learns from scratch and does not transfer any knowledge or information from previous tasks.

This means that the training process for each model is independent and doesn't take advantage of any previously learned information.

On the other hand, transfer learning is a machine learning technique that leverages previously learned knowledge to solve a new problem. In transfer learning, a model trained on one task is used as a starting point for another related task.

The idea is that the knowledge gained from the previous task can be reused in the new task, thereby reducing the computational cost and improving performance.

In the figure provided, we can see how transfer learning works. The knowledge gained by a model trained to detect circles in one domain is transferred to a different model to detect square shapes in another domain.

This way, the weights and biases learned by the first model can be used as a starting point for the second model, saving time and resources.

Transfer learning is particularly useful in scenarios where there is limited data available for a new task or when the new task is related to a previously learned task.

It can improve the model's performance while reducing the time and resources needed for training.

Pros and Cons of Transfer Learning

Transfer Learning Pros

Faster Training:
- Transfer learning can accelerate the training process of a new model by leveraging pre-trained models that have already learned general patterns from a large dataset.
Higher Accuracy:
- Pre-trained models are often trained on very large datasets, which means they have already learned complex features and patterns that can be applied to a new dataset.
- This can result in higher accuracy when training a new model.
Less Data Required:
- Since the pre-trained model has already learned general patterns from a large dataset, fewer data is required to train a new model.
- This is especially useful when working with small datasets, which may not be sufficient to train a model from scratch.
Lower Computational Resources:
- Training a deep learning model from scratch can require significant computational resources.
- Developers can use a pre-trained model to reduce the computational resources required to train a new model.

Transfer Learning Cons

Limited Flexibility:
- Pre-trained models are trained on a specific dataset, which means they may only be suitable for some tasks.
- We may need to fine-tune the pre-trained model or use different layers to adapt the model based on the task.
Overfitting:
- Transfer learning may result in overfitting, where the model performs well on the training data but poorly on new data.
- This can happen if the pre-trained model is too complex or if the new data is too different from the data used to train the pre-trained model.
Limited Generalization:
- Although pre-trained models can learn general patterns, they may not be able to generalize to new data that is significantly different from the training data.
- We need to fine-tune the pre-trained model or use different layers to improve the model's generalisation.

Popular Pretrained Models

There are different variants of pre-trained neural networks. Each has its size, architecture, speed, and share of advantages and disadvantages.

Let’s have a look at some of the popular pre-trained models.

Inception

The “Inception” model is a micro-architecture that was first introduced by Szeged et al., in their paper 2014-Going Deeper with Convolutions.

It is a Convolution Neural Network (CNN) that is 27 layers deep but has several interesting layers called the inception layers, which is the sparsely linked architecture's central notion.

The inception layer is a combination of various layers, such as the 1X1 convolutional layer, 3x3 convolutional layer and 5X5 convolutional layer, with their output filter concatenated into a single output vector forming the input of the next stage.

Xception

Xception stands for the Extreme version of Inception. Xception is one of the best light weighted models, which Francois Chollet, the creator of Keras Library, proposed. It was developed in 2017.

The original depth-wise separable convolution of the inception model is the depthwise convolution followed by a pointwise convolution in the Xception model.

Resnet

Resnet quickly became one of the most popular architectures in various computer vision tasks. It was first introduced by He et al.in their research work 2015 paper- Deep Residual Learning for Image Recognition.

It is a deep neural network that works on the core idea of the so-called identity shortcut connection that skips one or more layers. It has boosted the ability to perform image classification, such as object detection and face recognition.

VGGNET

It is introduced by Simonyan and Zisserman in their 2014 paper, Very Deep Convolutional Networks for Large Scale Image Recognition. This model supports up to 19 layers and builds as a deep CNN.

VGG is also one of the most used architectures for image recognition.

When To Use Tranfer Learning?

Here are some situations where transfer learning can be particularly effective:

Limited Data

Transfer learning can be useful when you only have a small amount of data available to train a model because the pre-trained model already has knowledge of the domain and can be used to enhance the model's performance.

Transfer learning can be used to modify a pre-trained model to your specific job and boost its accuracy, for instance, if you want to categorize photographs of rare animals but only have a small number of labelled images.

Low Computing

Starting from scratch while training a deep neural network can be time- and resource-consuming.

By starting with a pre-trained model that has already picked up useful features and then customizing it for your particular activity, transfer learning can be utilized to save time and resources.

New Domains

When you want to apply machine learning to a new domain you have limited knowledge of, transfer learning can leverage the expertise of pre-trained models that have already learned about the domain.

For example, suppose you want to classify medical images but need more expertise in the medical field.

In that case, you can use transfer learning to adapt a pre-trained model that has already learned useful features in the medical domain.

Small Improvements

In some cases, transfer learning can be used to achieve small improvements in model performance.

For example, suppose you have a model that is performing well on a specific task but you want to make small improvements to its performance. In that case, you can use transfer learning to fine-tune the model on your specific task and improve its accuracy.

Overall, transfer learning can be a powerful approach for machine learning practitioners who want to leverage existing knowledge and resources to solve new problems.

Stages In Transfer Learning

1. Select a pre-trained model

The first step in transfer learning is to select a pre-trained model that is appropriate for the task at hand. The pre-trained model should have been trained on a similar dataset or task to the one you want to solve.

2. Choose layers to retrain

Once you have selected a pre-trained model, you need to decide which layers to retrain for your specific task. The lower layers of the pre-trained model are usually good for extracting general features, while the higher layers are better for task-specific features.

3. Prepare the data

Next, you need to prepare your own dataset for your specific task. This may involve resizing images, normalizing data, or performing other data preprocessing steps.

4. Re-train the model

After selecting the layers to retrain and preparing the data, you can begin retraining the model. The process of retraining the model involves freezing the pre-trained layers and training the new task-specific layers on your own data.

5. Fine-tune the model

Once the new layers have been trained, you can fine-tune the entire model by unfreezing some or all of the pre-trained layers and continuing to train the model on your data.

6. Evaluate the model

Finally, you need to evaluate the performance of the model on a validation set. This will give you an idea of how well the model is performing and whether any additional tweaks or adjustments need to be made.

Image Classification with Transfer Learning

Image classification with pre-trained models is a popular technique in deep learning and computer vision that allows developers and researchers to leverage pre-existing, large-scale neural networks trained on vast datasets to classify new images.

The technique involves using a pre-trained model that has been trained on a large image dataset, such as ImageNet, to classify new images based on their visual features.

Building Image Classification Model

We loaded the Pre-trained Xception model with trained weights. Using summary(), we can see the model architecture summary.

With this script the pre-trained Xception model was successfully loaded. Now let’s go and predict the below test image

Image Classification Model Evaluation

Here, we are loading the test image, resizing the image into the expected shape i.e Xception network expects the shape to be 299X299.

Here, we are loading the imagenet labels, iterating over every label, and returning the top 3 predictions for the given input image. You can observe the results in the below image.

The pre-trained model was able to predict the given input image with 94% of assurance. You can try different images and observe their performance on them.

Conclusion

In conclusion, transfer learning is a powerful technique in deep learning that allows users to leverage pre-trained models to accelerate the training process of a new model.

It can result in faster training, higher accuracy, and lower computational resources required.

However, some potential drawbacks include limited flexibility, overfitting, unintended biases, and limited generalisation. Overall, transfer learning can be a valuable tool for various Deep Learning applications.

I hope you learned about Transfer Learning. As a thought exercise, go and build the Image classification on custom data and see the results how well you are getting results.