While training deep neural net, there are many parameters to be initialized and trained through the forward and backward propagation. A lot of times we spent a lot of time on trying different activation function, tuning the depth of deepnet, and number of units and other hyperparameters. But we may forget the importance of initialization of its weights and biases. In this article, I’ll share three ways of initialization methods (1. zeros initialization, 2. Random initialization 3. He initialization) and see their corresponding impact.
In this example, it is a three-layer neural network with the following setup: Linear –> RELU –> Linear –> RELU -> Linear –> Sigmoid. So the first two hidden layer are (linear + Relu) and the last layer (L) is (linear + sigmoid) as illustrated in the below figure.
The dataset is created using the following code.
The data looks like this:
- Zero initialization
In this case, just assign all parameters and bias to zero using np.zeros().
The below plots shows that none of the points are correctly separated and the logloss cost function stays stagnant since all the neuron are the same.
2. Random initialization
In this case, the weights are randomly initialized by a large number 10 and bias set to zero. And we can see the neural network starts to learn correctly.
3. He initialization
Last, we’ll see how ‘He’ initialization method works. In He et al., 2015, they proposed a new way for neural network: sqrt(2./layer_dims[l-1]). And we can see that this separated the two-class very well.
As we can see that initialization is very important in training deep neural network. It is important to break the symmetry of the neurons in the same layer and proper initialization can make your training much faster.
The forward propagation and backward propagation are shown below: