Home » Sigmoid function

Sigmoid function

Whether you’re building a neural network from scratch or using a pre-built library, knowing how a sigmoid function works and why it matters is crucial. Learning how a neural network solves difficult problems requires a grasp of the sigmoid function. This function was also used as a starting point for finding other functions that produce effective and desirable outcomes in supervised learning designs for deep neural networks.

The sigmoid function and its application to example-based learning in neural networks are the subjects of this tutorial. In this article we explain each and every thing how it works.

Calculating the Sigmoid Function

Many machine learning tasks require converting a real number to a probability, and this is where sigmoid functions shine. When added as the final layer of a machine learning model, a sigmoid function can transform the network’s output into a probability score that is more manageable and intuitive.

The sigmoid function, a variant of the logistic function, is often represented by the symbols sigmoidal (sig) or (x) (x). It’s a result of:

σ(x) = 1/(1+exp(-x))

The Sigmoid Function: What Is It?

Mathematically speaking, the curve of a Sigmoid function is recognizable as an S shape. The logistic function, the hyperbolic tangent, and the arctangent are just a few examples of widely used sigmoid functions.

One application of transform a real number into a probability value, as all functions have the property of mapping the entire number line into a limited range, such as between 0 and 1, or -1 and 1.

Features and Characteristics of the Sigmoid Function

An S-shaped curve, as depicted by the green line in the following graph, characterizes the graph of the sigmoid function. The derivative’s graphical representation is also included in the illustration, and it is depicted in pink. On the right, we see the derivative’s statement and a few of its salient qualities.

Furthermore, it has the following characteristics:

Domicile: (-, +)

Range: (0, +1) σ(0) = 0.5

Observe that there is a steady ascent in the function.

The function does not have any local discontinuities.

The domain of the function is completely differentiable.

To calculate the value of this function numerically, we only need a small range of values, such as [-10, +10]. At negative ten, the function’s worth is close to zero. The function is close to one for all values bigger than 10.

Since may be employed as an activation function in a neural network, they have gained popularity in the field of deep learning. The activation potential of biological brain networks served as inspiration.

The Suppressing Power of the Sigmoid Function

Since its domain is the set of all real numbers and its range is the set of all real numbers, the sigmoid function is also known as a squashing (0, 1). Therefore, the output is always between 0 and 1 even whether the input is a very large negative number or a very large positive number. The same holds for any negative or positive number.

The Sigmoid Activation Function for Neural Networks

In artificial neural networks, the sigmoid function is utilized as an activation function. To quickly refresh your memory, the image below illustrates how an activation function is used in a single layer of a neural network. Each layer’s output is fed into the next layer as input after being processed by an activation function applied to a weighted sum of the previous layer’s inputs.

It is guaranteed that a neuron’s output will be between 0 and 1 when it is used as its activation. Additionally, the output of this device would be a non-linear function of the weighted sum of inputs, just its activation function is known as a sigmoid unit.

Separability: Linear vs. Non-Linear

For a problem to be linearly separable, as depicted in the left picture, the boundary between the two categories must be linear. The problem depicted on the right necessitates a non-linear decision boundary since it is not linearly separable.

The equation of a plane can be used to describe a linear decision boundary in three-dimensional space. The equation of a hyperplane describes the linear decision boundary for n-dimensional space.

Explaining the Role of the Sigmoid Function in Neural Networks.

The sigmoid function is useful in neural networks for learning complex decision functions since its employment results in non-linear boundaries.

An activation function for a neural network can only be a non-linear function if it steadily increases in value. Therefore, sin(x) and cos(x) can’t be utilized in place of a, say, activation function. It is also important that the activation function is defined globally, and that it is continuous globally in the space of real numbers. A further criterion is that the function is differentiable everywhere the real numbers can be.

To learn a neural network’s weights, backpropagation algorithms often employ gradient descent. It is necessary to have the derivative of the activation function to derive this algorithm.

Problems with the sigmoid function

However, there are situations where the sigmoid function’s troublesome gradient makes gradient descent a poor choice. When the input is very small or very large, the gradient goes to zero, making it difficult for some models to advance.

For instance, in deep learning, the weights and biases of a neural network are updated using the gradient of a sigmoid activation during backpropagation. In the absence of significant gradients, the network will not be able to adjust its weights and biases.

Rectified Linear Units (ReLu) and other non-linear functions are alternatives that avoid these problems.

The Rest of the Learning Tools

Coursera’s Supervised Machine Learning course covers the sigmoid function in depth during week three, providing an example with logistic regression.

CONCLUSIONS

In this lesson, you learned what a sigmoid function is and how to use it. To be more precise, you picked up:

Boundaries for making linear versus non-linear choices

To further understand how a sigmoid function added to the hidden layer allows a neural network to learn non-linear boundaries, it is helpful to first understand why this is the case.

Asking if there are any concerns or queries.

Feel free to post any questions you may have insideaiml.com, and I will do my best to respond.