How Multinomial Logistic Regression Model Works In Machine Learning

Multinomial Logistic Regression model

Multinomial Logistic Regression Model

How the multinomial logistic regression model works

In the pool of supervised classification algorithms, the logistic regression model is the first most algorithm to play with. This classification algorithm is again categorized into different categories. These categories are purely based on the number of target classes.

If the logistic regression model used for addressing the binary classification kind of problems it’s known as the binary logistic regression classifier. Whereas the logistic regression model used for multiclassification kind of problems, it’s called the multinomial logistic regression classifier.

As we discussed each and every block of binary logistic regression classifier in our previous article. Now we use the binary logistic regression knowledge to understand in details about, how the multinomial logistic regression classifier works.

I recommend first to check out the how the logistic regression classifier works article and the Softmax vs Sigmoid functions article before you read this article.

Learn each and every stage of multinomial logistic regression classifier. Click To Tweet

Before we start the drive, let’s look at the table of contents of this article.

Table of Contents:

  • What is Logistic Regression?
  • What is Multinomial Logistic Regression?
  • Multinomial Logistic Regression Example
  • How Multinomial Logistic Regression model works.
    • Logits
    • Softmax Function
      • Properties
      • Implementation in Python
    • Cross Entropy
  • Parameters Optimization
    • Loss Function
  • Conclusion

What is Logistic Regression?

The logistic regression model is a supervised classification model. Which uses the techniques of the linear regression model in the initial stages to calculate the logits (Score). So technically we can call the logistic regression model as the linear model. In the later stages uses the estimated logits to train a classification model.  The trained classification model performs the multi-classification task.

If you were not aware of the logits and the basic linear regression model techniques don’t be frightened. In the coming sections. We are going to learn each and every block of multinomial logistic regression from inputs to the target output representation.

As we discussed earlier the logistic regression models are categorized based on the number of target classes and uses the proper functions like sigmoid or softmax functions to predict the target class.

In short:

  • Sigmoid function: used in the logistic regression model for binary classification.
  • Softmax function: used in the logistic regression model for multiclassification.

To learn more about sigmoid and softmax functions checkout difference between softmax and sigmoid functions article.

What is Multinomial Logistic Regression?

Multinomial logistic regression is also a classification algorithm same like the logistic regression for binary classification. Whereas in logistic regression for binary classification the classification task is to predict the target class which is of binary type. Like Yes/NO, 0/1, Male/Female.

When it comes to multinomial logistic regression. The idea is to use the logistic regression techniques to predict the target class (more than 2 target classes).

The underline technique will be same like the logistic regression for binary classification until calculating the probabilities for each target. Once the probabilities were calculated. We need to transfer them into one hot encoding and uses the cross entropy methods in the training process for calculating the properly optimized weights.

Multinomial logistic regression works well on big data irrespective of different areas. Surprisingly it is also used in human resource development and more in depth details about how the big data is used in human resource development can found in this article.

We are going to learn the whole process of multinomial logistic regression from giving inputs to the final one hot encoding in the upcoming sections of this article.

Before that, Let’s quickly check few examples to understand what kind of problems we can solve using the multinomial logistic regression.

Multinomial Logistic Regression Example

Using the multinomial logistic regression. We can address different types of classification problems. Where the trained model is used to predict the target class from more than 2 target classes. Below are few examples to understand what kind of problems we can solve using the multinomial logistic regression.

Now let’s move on to the key part this article “understand how the multinomial logistic regression model works’

How Multinomial Logistic Regression model works

Multinomial Logistic Regression Classifier

Multinomial Logistic Regression Classifier

The above image illustrates the workflow of multinomial logistic regression classifier. To simply the understanding process let’s split the multinomial logistic classifier techniques into different stages from inputs to the outputs. Then we can discuss each stage of the classifier in detail.

Multinomial Logistic Regression Workflow/ Stages:

  • Inputs
  • Linear model
  • Logits
  • Softmax Function
  • Cross Entropy
  • One-Hot-Encoding


The inputs to the multinomial logistic regression are the features we have in the dataset. Suppose if we are going to predict the Iris flower species type, the features will be the flower sepal length, width and petal length and width parameters will be our features. These features will treat as the inputs for the multinomial logistic regression.

The keynote to remember here is the features values always numerical. If the features are not numerical, we need to convert them into numerical values using the proper categorical data analysis techniques.

Just a simple example: If the feature is color and having different attributes of the color features are RED, BLUE, YELLOW, ORANGE. Then we can assign an integer value to each attribute of the features like for RED we can assign 1. For BLUE we can assign the value 2 likewise of the other attributes for the color feature. Later we can use the numerically converted values as the inputs for the classifier.

Linear Model

The linear model equation is the same as the linear equation in the linear regression model. You can see this linear equation in the image. Where the X is the set of inputs, Suppose from the image we can say X is a matrix. Which contains all the feature( numerical values) X = [x1,x2,x3]. Where W is another matrix includes the same input number of weights W = [w1,w2,w3].

In this example, the linear model output will be the w1*x1,  w2*x2, w3*x3  

The weights w1, w2, w3, w4 will update in the training phase. We will learn about this in the parameters optimization section of this article.


The Logits also called as scores. These are just the outputs of the linear model. The Logits will change with the changes in the calculated weights.

Softmax Function

The Softmax function is a probabilistic function which calculates the probabilities for the given score. Using the softmax function return the high probability value for the high scores and fewer probabilities for the remaining scores. This we can observe from the image. For the logits 0.5, 1.5, 0.1 the calculated probabilities using the softmax function are 0.2, 0.7, 0.1

For the Logit 1.5, we are getting the high probability value 0.7 and very less probability value for the remaining Logits 0.5 and 0.1

Keynote: The Logits and the probabilities in the image were just for the purpose of understanding the multinomial logistic regression model. The values are not the real values computed using the softmax function.

Softmax Function Properties

Below are the few properties of softmax function.

  • The calculated probabilities will be in the range of 0 to 1.
  • The sum of all the probabilities is equals to 1.

Implementing Softmax Function In Python

Now let’s implement the softmax function in Python

Script Output

If we observe the function output for the input value 6 we are getting the high probabilities. This is what we can expect from the softmax function. Later in classification task, we can use the high probability value for predicting the target class for the given input features.

Cross Entropy

The cross entropy is the last stage of multinomial logistic regression. Uses the cross-entropy function to find the similarity distance between the probabilities calculated from the softmax function and the target one-hot-encoding matrix.

Before we learn more about Cross Entropy, let’s understand what it is mean by One-Hot-Encoding matrix.


One-Hot Encoding is a method to represent the target values or categorical attributes into a binary representation. From this article main image, where the input is the dog image, the target having 3 possible outcomes like bird, dog, cat.  Where you can find the one-hot-encoding matrix like [0, 1, 0].

The one-hot-encoding matrix is so simple to create. For every input features (x1, x2, x3) the one-hot-encoding matrix is with the values of 0 and the 1 for the target class. The total number of values in the one-hot-encoding matrix and the unique target classes are the same.

Suppose if we have 3 input features like x1, x2, and x3 and one target variable (With 3 target classes). Then the one-hot-encoding matrix will have 3 values. Out of 3 values, one value will be 1 and all other will be 0s. 

You will know where to place the 1 and where to place the 0 value from the training dataset. Let’s take one observation from the training dataset which contains values for x1, x2, x3 and what will be the target class for that observation. The one-hot-encoding matrix will be having 1 for the target class for that observation and 0s for other.

Cross Entropy

The Cross-entropy is a distance calculation function which takes the calculated probabilities from softmax function and the created one-hot-encoding matrix to calculate the distance. For the right target class, the distance value will be less, and the distance values will be larger for the wrong target class.

With this, we discussed each stage of the multinomial logistic regression. All the stages/ workflow will happen for each observation in the training set.

Observation: Single observation is just the single row values from the traging set. Which contains the features and the correspoinding target class.

For each observation in the training set will pass through the all the stages of multinomial logistic regression and the proper weights W (w1, w2, w3) values will be computed. How the weights calculated and update the weights is know as the Parameters Optimization.

Parameters Optimization

The expected output after training the multinomial logistic regression classifier is the calculated weights. Later the calculated weights will be used for the prediction task.

This Parameters optimization is an iteration process where the calculated weights for each observation used to calculate the cost function which is also known as the Loss function.

The Iteration process ends when the loss function value is less or significantly negligible. Now let’s learn about this loss function to sign off from this lengthy article 🙂

Loss function

The input parameters for the loss function is the calculated weights and all the training observations. The function calculates the distance between the predicted class using the calculated weights for all the features in the training observation and the actual target class.

If the loss function value is fewer means with the estimated weights, we are confident to predict the target classes for the new observations (From test set).  In the case of high loss function value, the process of calculating the weights will start again with derivated weights of the previously calculated weights.

The process will continue until the loss function value is less.

This is the whole process of multinomial logistic regression. If you are thinking, it will be hard to implement the loss function and coding the entire workflow. Don’t frighten. We were so lucky to have the machine learning libraries like scikit-learn. Which performs all this workflow for us and returns the calculated weights.

In the next article, we are going to implement the logistic regression model using the scikit-learn library to perform the multiclassification task.


This article gives the clear explanation on the each stage of multinomial logistic regression. Below are the discussed workflow or stages.

  • Inputs
  • Linear model
  • Logits
  • Softmax Function
  • Cross Entropy
  • One-Hot-Encoding


Follow us:


I hope you like this post. If you have any questions, then feel free to comment below.  If you want me to write on one particular topic, then do tell it to me in the comments below.

Related Data Science Courses

  • […] world classification applications. Will learn the fundamental difference between the binary and multinomial classification techniques. You will be introduced to the concepts like logistic regression, support vector machine […]

  • […] the implementation of logistic regression in python, as the background concepts explained in how the logistic regression model works […]

  • >