Difference Between Softmax Function and Sigmoid Function

March 7, 2017 Saimadhu Polamuri

Softmax Function Vs Sigmoid Function

Softmax Function Vs Sigmoid Function

While learning the logistic regression concepts, the primary confusion will be on the functions used for calculating the probabilities. As the calculated probabilities are used to predict the target class in logistic regression model. The two principal functions we frequently hear are Softmax and Sigmoid function.

Even though both the functions are same at the functional level. (Helping to predict the target class) many noticeable mathematical differences are playing the vital role in using the functions in deep learning and other fields of areas.

So In this article, we were going to learn more about the fundamental differences between these two function and the usages.

Before we begin, let’s quickly look at the table of contents.

Table of Contents:

What is Sigmoid Function?
Properties of Sigmoid Function
Sigmoid Function Usage
Implementing Sigmoid Function In Python
Creating Sigmoid Function Graph
What is Softmax Function?
Properties of Softmax Function
Softmax Function Usage
Implementing Softmax Function In Python
Creating Softmax Function Graph
Difference Between Sigmoid Function and Softmax Function
Conclusion

Sigmoid Function Vs Softmax Function #machinelearning Click To Tweet

What is Sigmoid Function?

Sigmoid Function

In mathematical definition way of saying the sigmoid function take any range real number and returns the output value which falls in the range of 0 to 1. Based on the convention we can expect the output value in the range of -1 to 1.

The sigmoid function produces the curve which will be in the Shape “S.” These curves used in the statistics too. With the cumulative distribution function (The output will range from 0 to 1)

Properties of Sigmoid Function

The sigmoid function returns a real-valued output.
The first derivative of the sigmoid function will be non-negative or non-positive.
- Non-Negative: If a number is greater than or equal to zero.
- Non-Positive: If a number is less than or equal to Zero.

Sigmoid Function Usage

The Sigmoid function used for binary classification in logistic regression model.
While creating artificial neurons sigmoid function used as the activation function.
In statistics, the sigmoid function graphs are common as a cumulative distribution function.

Implementing Sigmoid Function In Python

Now let’s implement the sigmoid function in Python

# Required Python Package
import numpy as np

def sigmoid(inputs):
    """
    Calculate the sigmoid for the give inputs (array)
    :param inputs:
    :return:
    """
    sigmoid_scores = [1 / float(1 + np.exp(- x)) for x in inputs]
    return sigmoid_scores


sigmoid_inputs = [2, 3, 5, 6]
print "Sigmoid Function Output :: {}".format(sigmoid(sigmoid_inputs))

The above is the implementation of the sigmoid function.

The function will take a list of values as an input parameter.
For each element/value in the list will consider as an input for the sigmoid function and will calculate the output value.
The code 1 / float(1 + np.exp(- x)) is the fucuntion is used for calcualting the sigmoid scores.
Next, we take a list sigmiod_inputs having the values 2,3,5,6 as an input the function we implemented to get the sigmoid scores.

Script Output

Sigmoid Function Output :: [0.8807970779778823, 0.9525741268224334, 0.9933071490757153, 0.9975273768433653]

Creating Sigmoid Function Graph

Now let’s use the above function to create the graph to understand the nature of the sigmoid function.

We are going to pass a list which contains numbers in the range 0 to 21.
Will compute the sigmoid scores for the list we passed.
Then we will use the outputs values to visualize the graph.

# Required Python Packages
import numpy as np
import matplotlib.pyplot as plt


def sigmoid(inputs):
    """
    Calculate the sigmoid for the give inputs (array)
    :param inputs:
    :return:
    """
    sigmoid_scores = [1 / float(1 + np.exp(- x)) for x in inputs]
    return sigmoid_scores


def line_graph(x, y, x_title, y_title):
    """
    Draw line graph with x and y values
    :param x:
    :param y:
    :param x_title:
    :param y_title:
    :return:
    """
    plt.plot(x, y)
    plt.xlabel(x_title)
    plt.ylabel(y_title)
    plt.show()


graph_x = range(0, 21)
graph_y = sigmoid(graph_x)

print "Graph X readings: {}".format(graph_x)
print "Graph Y readings: {}".format(graph_y)

line_graph(graph_x, graph_y, "Inputs", "Sigmoid Scores")

Creating a graph_x list which contains the numbers in the range of 0 to 21.
Next, in the graph_y list, we are storing the calculated sigmoid scores for the given graph_x inputs.
Calling the line_graph function, which takes the x, y, and titles of the graph to create the line graph.

Script Output

Graph X readings: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

Graph Y readings: [0.5, 0.7310585786300049, 0.8807970779778823, 0.9525741268224334, 0.9820137900379085, 0.9933071490757153, 0.9975273768433653, 0.9990889488055994, 0.9996646498695336, 0.9998766054240137, 0.9999546021312976, 0.999983298578152, 0.9999938558253978, 0.999997739675702, 0.9999991684719722, 0.999999694097773, 0.9999998874648379, 0.9999999586006244, 0.9999999847700205, 0.9999999943972036, 0.9999999979388463]

Graph

On successfully running the above code the below image will appear on your screen. If the above code failed in your system. Check the machine learning packages setup.

Sigmoid graph

From the above graph, we can observe that with the increase in the input value the sigmoid score increase till 1. The values which are touching at the top of the graph are the values in the range of 0.9 to 0.99

What is Softmax Function?

Softmax Function

Softmax function calculates the probabilities distribution of the event over ‘n’ different events. In general way of saying, this function will calculate the probabilities of each target class over all possible target classes. Later the calculated probabilities will be helpful for determining the target class for the given inputs.

The main advantage of using Softmax is the output probabilities range. The range will 0 to 1, and the sum of all the probabilities will be equal to one. If the softmax function used for multi-classification model it returns the probabilities of each class and the target class will have the high probability.

The formula computes the exponential (e-power) of the given input value and the sum of exponential values of all the values in the inputs. Then the ratio of the exponential of the input value and the sum of exponential values is the output of the softmax function.

Properties of Softmax Function

Below are the few properties of softmax function.

The calculated probabilities will be in the range of 0 to 1.
The sum of all the probabilities is equals to 1.

Softmax Function Usage

Used in multiple classification logistic regression model.
In building neural networks softmax functions used in different layer level.

Implementing Softmax Function In Python

Now let’s implement the softmax function in Python

# Required Python Package
import numpy as np


def softmax(inputs):
    """
    Calculate the softmax for the give inputs (array)
    :param inputs:
    :return:
    """
    return np.exp(inputs) / float(sum(np.exp(inputs)))


softmax_inputs = [2, 3, 5, 6]
print "Softmax Function Output :: {}".format(softmax(softmax_inputs))

Script Output

Softmax Function Output :: [ 0.01275478  0.03467109  0.25618664  0.69638749]

If we observe the function output for the input value 6 we are getting the high probabilities. This is what we can expect from the softmax function. Later in classification task, we can use the high probability value for predicting the target class for the given input features.

Creating Softmax Function Graph

Now let’s use the implemented Softmax function to create the graph to understand the behavior of this function.

We are going to create a list which contains values in the range of 0 to 21.
Next, we are going to pass this list to calculate the scores from the implemented function.
To create the graph we are going to use the list and the estimated scores.

# Required Python Packages
import numpy as np
import matplotlib.pyplot as plt


def softmax(inputs):
    """
    Calculate the softmax for the give inputs (array)
    :param inputs:
    :return:
    """
    return np.exp(inputs) / float(sum(np.exp(inputs)))


def line_graph(x, y, x_title, y_title):
    """
    Draw line graph with x and y values
    :param x:
    :param y:
    :param x_title:
    :param y_title:
    :return:
    """
    plt.plot(x, y)
    plt.xlabel(x_title)
    plt.ylabel(y_title)
    plt.show()


graph_x = range(0, 21)
graph_y = softmax(graph_x)

print "Graph X readings: {}".format(graph_x)
print "Graph Y readings: {}".format(graph_y)

line_graph(graph_x, graph_y, "Inputs", "Softmax Scores")

Script Output

Graph X readings: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
Graph Y readings: [ 1.30289758e-09 3.54164282e-09 9.62718331e-09 2.61693975e-08 7.11357976e-08 1.93367146e-07 5.25626399e-07 1.42880069e-06 3.88388295e-06 1.05574884e-05 2.86982290e-05 7.80098744e-05 2.12052824e-04 5.76419338e-04 1.56687021e-03 4.25919483e-03 1.15776919e-02 3.14714295e-02 8.55482149e-02 2.32544158e-01 6.32120559e-01]

Graph

Softmax Graph

The figure shows the fundamental property of softmax function. The high value will have the high probability.

Difference Between Sigmoid Function and Softmax Function

The below are the tabular differences between Sigmoid and Softmax function.

	Softmax Function	Sigmoid Function
1	Used for multi-classification in logistic regression model.	Used for binary classification in logistic regression model.
2	The probabilities sum will be 1	The probabilities sum need not be 1.
3	Used in the different layers of neural networks.	Used as activation function while building neural networks.
4	The high value will have the higher probability than other values.	The high value will have the high probability but not the higher probability.

Conclusion

In this article, you learn in details about two functions which determine the logistic regression model. Just for a glance.

Softmax: Used for the multi-classification task.
Sigmoid: Used for the binary classification task.

Related Data Science Courses

22 Responses to “Difference Between Softmax Function and Sigmoid Function”

Nian
9 months ago
Reply

The first derivative of the sigmoid function will always be non-negative?
- Saimadhu Polamuri
  6 months ago
  Reply
  
  The sigmoid function is given by:
  
  \sigma(x) = \frac{1}{1 + e^{-x}}
  
  The first derivative (or the gradient) of the sigmoid function with respect to \( x \) is:
  
  \sigma'(x) = \sigma(x)(1 – \sigma(x))
  
  Given that the output of the sigmoid function \sigma(x) always lies in the range (0, 1) , the value of \sigma(x)(1 – \sigma(x)) will always be between 0 and 0.25 (non-inclusive of 0 and inclusive of 0.25).
  
  So, the first derivative of the sigmoid function is always non-negative and less than or equal to 0.25. It will never be exactly 0, but it approaches 0 as x goes to positive or negative infinity.
Tour of Never
4 years ago
Reply

Hello! I’m at work browsing your blog from my new iphone 4!
Just wanted to say I love reading your blog and look forward
to all your posts! Carry on the fantastic work!
- Saimadhu Polamuri
  4 years ago
  Reply
  
  Hi,
  
  Thanks for your compliment.
  Happy learning!
chirag aggarwal
4 years ago
Reply

Explained in really a superb manner which is easy to understand for new-comers in ML field.
- Saimadhu Polamuri
  4 years ago
  Reply
  
  Hi Chirag Aggarwal,
  
  Thanks for your compliment.
  
  Happy learning!
naga pavan kumar kalepu
5 years ago
Reply

Thanks Bro for information
- Saimadhu Polamuri
  4 years ago
  Reply
  
  Hi Naga pavan,
  
  We are glad that the post has given you a good idea about the softmax and sigmoid functions.
  
  Thanks and happy learning.
Mirsch
5 years ago
Reply

Very good article! Could you explain why you choosed the bicycle and the motorbike for the sigmoid and softmax function (In the images). Is there a deeper meaning?
- Saimadhu Polamuri
  4 years ago
  Reply
  
  Hi Mirsch,
  
  Yes, we have a deeper meaning for using bicycle and motorbike, the intention is softmax function best compared with sigmoid that the reason we have used both the motorbike and bicycle to showcase the same.
  
  Thanks and happy learning.
Harshal
5 years ago
Reply

Articles are very good, But the font is not properly visible. Please try changing font color
- Saimadhu Polamuri
  4 years ago
  Reply
  
  Hi,
  
  Thanks for the compliment, the font is specific to the blog template we are using and we can’t change different font colors, sorry for that.
  
  We wish you happy learning.
Praful
5 years ago
Reply

Awesome post….
- Saimadhu Polamuri
  4 years ago
  Reply
  
  Hi Praful,
  
  Thanks for the compliment.
  
  We wish you a very happy learning.
sajjad_afridi
6 years ago
Reply

Very helpful (Y)
- Saimadhu Polamuri
  6 years ago
  Reply
  
  Hi Sajjad_afridi,
  
  Thanks for your compliment. 🙂
Praveen
6 years ago
Reply

In the statement, “In mathematical definition way of saying the sigmoid function take any range real number and returns the output value which falls in the range of 0 to 1. ”
I am not able to understand – “Based on the convention we can expect the output value in the range of -1 to 1.”
- Saimadhu Polamuri
  6 years ago
  Reply
  
  Hi Praveen
  Yes, you are correct based convention the output of sigmoid will be in the range of -1 to 1. These conventions are the way we use different functions.
ML4ever
6 years ago
Reply

Wonderful and simple to understand. Thank you so much
- Saimadhu Polamuri
  6 years ago
  Reply
  
  Hi Ml4ever,
  
  Thanks for your compliment 🙂 We were glad to know that the article helped you.
anon
7 years ago
Reply

good post. easy to follow and understand
- Saimadhu Polamuri
  7 years ago
  Reply
  
  Hi Anon,
  Thanks for your compliment.