How Recurrent Neural Network (RNN) Works

November 5, 2020 Kaushik Das

How Recurrent Neural Network Works

If you know the basics of deep learning, you might be aware of the information flow from one layer to the other layer. Information is passing from layer 1 nodes to the layer 2 nodes likewise. But how about information is flowing in the layer 1 nodes itself. This is where recurrent neural network, in short, RNN architecture, came out.

Suppose we are building a model to predict the next coming word. How do you do that?

In this case, we need the previous word information of the prior state/node along with the input at the current layer node to generate the next coming word.

This kind of architecture is handy for the recurrent neural network. In short RNN.

Learn how recurrent neural network works #deeplearning #machinelearning #artificialintelligence #rnn

Click to Tweet

Don't worry if the above example is not clear; going forward, we are going to learn in detail about RNN.

When I think about any deep learning model, the quote by Eliezer Yudkowsky flows in my mind like the way information flows in deep learning models 🙂

Below the quote

“Anything that could give rise to smarter-than-human intelligence—in the form of Artificial Intelligence, brain-computer interfaces, or neuroscience-based human intelligence enhancement - wins hands down beyond contest as doing the most to change the world. Nothing else is even in the same league.”

—Eliezer Yudkowsky

Curious to know why I quote a prominent researcher in the field of artificial intelligence?

If you are eager to explore the answers, then you are in the right place. Let’s explore further and gain a deeper understanding of these terms.

Before that, let’s have a look at the topics you are going to learn in this article.

Before we learn about RNN, lets spend some time understanding the basic building blocks for deep learning models.

Introduction to Artificial Neural Networks

Neural Networks are also called Artificial Neural Networks (ANN). ANN is the primary neural network structure.

The architecture of the ANN contains thousands of neurons, like the human brain.

In other words, neural networks are a set of algorithms that mimic the behavior of the human brain and are designed to recognize the various patterns.

Neural Network Architecture

Types of layers in ANN

Input Layer: Networks have only one input layer.
Hidden Layer: Network will have one or more hidden layers.
Output Layer: Networks have only one output layer.

Every layer contains one or more neurons. Each neuron connects to other neurons with specific values(weights, bais, activation function).

All inputs and outputs are independent of each other.

In the training phase, data will be distributed to all neuron collections; then, the neuron points will learn patterns from the data.

ANN or neural networks work fine for a few tasks, In fact Ann works better than popular machine learning models, like logistic regression, random forest, support vector machine. But when we try to work with sequences of data such as text, time series, etc. it doesn’t work correctly.

Because ANN network inputs and outputs are independent means, ANN doesn’t have any prior knowledge on sequence input data. Here RNN (Recurrent Neural Networks) solve this problem.

Before discussing RNN, we need to have little knowledge of sequence modeling because RNN networks perform well when we work with sequence data.

Sequence Modeling

Sequence modeling is the process of predicting the next word or character. It computes the probability of words that will have a chance to occur subsequently in a particular sequence.

This model will take a high probability value of word or character as output. Unlike ANN, sequence modeling current output depends not only on current input but also on the previous output. This is the basic process of the family of RNNs.

Introduction to Recurrent Neural Network

The word neural network has been buzzworthy. It is one of the most popular algorithms in the field of artificial intelligence. They are capable of outperforming most machine algorithms in terms of computational speed and high-performance accuracy.

Neural networks have remained steadfast in matters of complex problem solving and research. This is one of the causes of artificial intelligence being considered a world-changing innovation.

For instance, have you wondered Google voice search and Siri works?

In simple terms, sequential data used for the voice search mechanism behind such technology.

To implement sequential data efficiently, the algorithm responsible for making it a possibility is Recurrent neural networks (RNN).

This is a state of the art algorithm that can memorise the input because of an internal memory associated with it. This is making them ideal for machine learning problems requiring sequential data.

In this article explores the world of artificial intelligence and RNNs. Which have been among the remarkable algorithms that have been instrumental in achieving tremendous success in deep learning in recent years.

Let's start the discussion with a high overview of RNN.

Recurrent Neural Network

Recurrent Neural Networks (RNN) are considered the basic and the most powerful neural networks. These algorithms have delivered promising results for various innovations, thereby gaining immense popularity.

The primary idea behind RNN is to process sequential data efficiently. RNN differs from traditional neural networks due to the concept of internal memory.

Although it has come into prominence over the past few years, recurrent neural networks are relatively older in its existence since the 1980s. RNN has come to the forefront with the technical evolution as we now have more computational power along with large volumes of data generated in recent times.

You must be wondering how an internal memory is helpful?

Due to internal memory, RNN’s are capable of remembering essential information about an input they have received. This is crucial for predicting outcomes more precisely.

The terminologies must have started titillating your minds, and I have got you covered on your next question.

What is sequential data?

There are several types of data such as

Time series
Speech data
Text data
Financial data
Audio data
Video data

They are the ones that are categorized under sequential data because they are ordered data in which inter-related factors follow each other.

RNNs can gain more in-depth insight into a sequence and its context from such datasets to derive significant meaning and arrive at an accurate prediction as per the targeted problem at hand.

Let’s deep dive and have a look at how recurrent neural networks (RNNs) work.

How does the RNN model work?

Typically, a traditional neural network processes the input and moves on the next without considering any sequence. On the other hand, sequential data is processed by following a specific order that is needed to understand them distinctly.

A feed-forward network is unable to comprehend the sequence as each input is considered to be individual ones. In contrast, for time series data, each input is dependent on the previous input.

The architecture of an RNN model is similar to that of a convolutional neural network (CNN) or other types of artificial neural networks.

To broadly categorize, a recurrent neural network comprises an input layer, a hidden layer, and an output layer.

However, these layers work in a standard sequence.

The input layer is responsible for fetching the data, which performs the data preprocessing, followed by passing the filtered data into the hidden layer.

A hidden layer consists of neural networks, algorithms, and activation functions for retrieving useful information out of the data. Finally, the information is sent to the output layer to provide the expected outcome.

To understand RNN, you will need to grasp the idea of a feed-forward network.

Neural Networks Comparison

The illustration above represents the difference between a feed-forward neural network and a recurrent neural network.

In a feed-forward neural network, the information can move in one direction only.

i.e., from the input layer to the hidden layer and then to the output layer. You need to note that once the information passes, it moves in a straight direction, and no node is touched upon for a second time.

As a feed-forward neural network considers only a current input, it has no perception of what has happened in the past except the training procedures.

The process of RNN varies significantly.

The information that passed through the architecture goes through a loop. Each input is dependent on the previous one for making decisions. RNN assigns the same and equal weight and bias for each of the layers in the network.

Therefore all, the independent variables are converted to dependent variables.

The loops in RNN ensures the information preserved in its memory. If you are pondering over the memory storing ability of this algorithm, then I will divulge the secret behind it.

This is possible by none other than its primary component, which is long-short term memory (LSTM). Recently, RNNs are the best for machine translation, speech recognition and conversational AI (Chatbots), and several other similar technological innovations.

One of the most popular technologies that have RNN powering its core is Google Translate.

Different types of Recurrent Neural Networks (RNNs)

RNNs are of different types based on the number of inputs concerning the number of outputs. The various types of RNNs are described below.

One to one
One to many
Many to one
Many to many

Types of RNN

One-to-one

This was formerly known as Vanilla RNN, usually characterized by a single variety of input, such as a word or image. At the same time, the outputs are produced as a single token value. All traditional neural networks fall into this category. Even the spam classifier kind of tasks can fall under this category.

One-to-many

A single input is used to create multiple outputs. A popular application for one to many is music generation.

Many-to-one

Consists of several inputs that used to create a single output. An example is sentiment analysis.

Many-to-many

Several inputs are used for generating several outputs. Name entity recognition is a famous example of this category.

The diagrammatic representation displays the various types of RNNs that have been discussed in the previous section.

We discussed how the information flows in between one layer to another, then how the backpropagation will work to reduce the loss and get optimized weights by computing the loss functions?

How Back propagation works in RNN

Training an RNN is very similar to any other neural network that you may have come across. The use of a backpropagation algorithm has been a great addition to the training procedure.

The objective of using backpropagation is to go back through the neural network such that any partial derivative of the error is identified with respect to weights. This allows us to remove such values from the weights.

The derivatives are used by gradient descent to minimize a given loss function. The weights are adjusted as per the way that can decrease the error rates.

This is how a neural network proceeds during a training process. Backpropagation through time is a way of performing backpropagation on an unrolled RNN. Unrolling allows you to visualize and understand the process within the network.

Ideally, backpropagation is already maintained when you implement RNN.

Recurrent Neural Networks Applications

We learned how RNN’s work, which brings the question of where can we use recurrent neural networks?

Applications for recurrent neural networks

RNNs have shown the great potential of being a reliable neural network. Over the years, there have been numerous advancements that have produced the state of the art technologies.

Let’s highlight some of the areas where RNNs are widely preferred.

Speech Recognition

You may be surprised to discover that some of the most popular personal assistants are powered by speech recognition technology. They are used for Google Assistant, Alexa by Amazon, Apple’s Siri, and in your smart driving assistance systems as well.

So why am I mentioning speech recognition and how RNN is inter-linked?

You may have tried giving voice commands like- “Hey Alexa, what is the temperature today?

Whenever you have communicated with a personal assistant, the smart system can comprehend your voice commands and provide you an answer based on your input.

These voice-based commands and interpreting the meaning of the commands to produce an accurate output have been possible with the help of algorithms such as RNN.

Image Captioning

Google Lens.

Does this ring a bell?

Have you noticed carefully how Google Lens operates?

When you feed an image, it automatically gives you an output of what that image is. Even we can consider some images processing application like face detection also leverages the rnn architecture.

For example, if you provide input in the form of an image of a car, then Google Lens gives you the result of the car and the correct brand name of the automotive company and some related car models that may look alike.

This is possible with the help of RNN. The process of assigning automated captions to an image is called image captioning.

Machine Translation

Machine Translation allows us to have automation in the process of language translation tasks. This is possible with the help of deep learning technologies. RNNs are useful for tasks that help to learn patterns from a dataset.

Translators such as google translate, Grammar checking tools are some of the examples that have implemented natural language processing (NLP) that have RNN as one of the principal algorithms to deliver accurate results.

Sentiment Analysis

Sentiment analysis is among the most common applications in the field of natural language processing. This is a method to identify whether a writer’s viewpoint towards a topic is positive or negative based on a computational analysis of opinions expressed in the form of comments or statements.

An example of such a scenario could be finding a movie rating based on the comments left by people who have watched the movie. To get you started, here is a link to an example of sentiment analysis using RNN.

Advantage and disadvantages of RNNs

Advantages of Recurrent Neural Network

RNN models are ideal for situations where we need information in a sequence way. One popular application count is predicting the next words, where We need to remember the previous word.
Recurrent neural networks are used prominently with convolutional layers to improve the effectiveness of pixel neighbourhoods.

Disadvantages of Recurrent Neural Network

Computation wise rnn are hard to train, and the information flow in between the layers makes it even a nightmare task.
When we use the tanh or relu activation function, holding the information for a longer time is difficult for RNN.

We discussed the advantages of recurrent neural networks, and we also discussed the disadvantages of RNN. Now let’s look at 2 key challenges in using recurrent neural networks along with the workaround for these issues.

Major obstacles of RNNs

RNNs face two types of challenges. However, to understand them clearly, you need to understand the basic concept of a gradient.

A gradient is used to measure the changes in the output of a function when the inputs are slightly modified. If you consider gradient as the slope of a function, then a higher gradient signifies a steeper slope.

This helps a model to learn faster. Similarly, if the slope is zero, then the model will stop the learning process. A gradient indicates the change in weights with regards to change in error.

Exploding Gradient

This is a scenario when you will encounter an algorithm that has assigned extremely high value to the weights.

Vanishing Gradient

The second challenge is vanishing gradient occurs when the values assigned are too small. This causes the computational model to stop learning or more processing time to produce a result.

This problem has been tackled in recent times with the introduction of the concept of LSTM.

Long Short-Term Memory (LSTM)

Long short-term memory, commonly known as LSTM, is responsible for memory extension. LSTM forms the building units for the layers of an RNN. The purpose of LSTM is to enable RNNs to memorize inputs for an extended period.

Due to the existence of memory, LSTM has the possibility of reading, writing, and deleting information from its memory, much like your personal computers. The gated cell in an LSTM network decides whether an input should be stored or erased depending upon the importance of the information through weights.

Over time, the algorithm can understand the importance of the information more precisely. The gates of an LSTM are divided as input gate, forget, and the output gate.

The input gate determines whether to let new inputs in, whereas the forget gate deletes the information that is not relevant. The output gate is responsible for processing the output.

Conclusion

I hope this article gave you a head start on the concepts of RNN and a clear understanding building blocks of recurrent neural networks.

Deep learning models are continuously evolving, and sub-fields of artificial intelligence, such as natural language processing, have gained center-stage in bringing innovations to our doorstep.

RNNs are very useful, and with them, a robust deep learning model can be built that is capable of high performance.

It is important to note that RNNs are getting upgraded to keep up with the changing trends. Therefore you must be aware of choosing the right one for your targeted problem.