Skip to main content

🎯 Weight Initialization in Neural Networks

You’ve learned how to do forward propagation, backpropagation, and gradient descent. But now comes a crucial design decision: How do we initialize the weights?

❄️ What Happens If You Initialize All Weights to Zero?​

🧠 Summary:​

If all weights in a layer are initialized to 0, all neurons in that layer will compute the exact same output. As a result, during training:

  • The gradients for each neuron will be the same.
  • All neurons will update identically.
  • There will be no diversity, and they will learn the same features.

This is known as the symmetry problem β€” and it’s bad because it defeats the purpose of having multiple neurons.

πŸ’₯ Why This Happens:​

Neurons are designed to specialize. If you initialize them the same way and give them the same inputs, they’ll just echo each other.

❌ Result:​

The model fails to learn complex representations. It might as well be a linear model.

🎲 Random Initialization to the Rescue!​

πŸ” What’s Different:​

Instead of setting weights to 0, we randomly initialize them with small values (e.g. multiply random values by 0.01).

W[1] = np.random.`randn(layer_dims[1], layer_dims[0])` * 0.01

This ensures:

  • Each neuron starts with different weights
  • The model breaks symmetry
  • Neurons can learn different features

πŸ“‰ Why Small Values?​

If weights are too large:

  • Activations like sigmoid and tanh saturate (flatten out)
  • Gradients vanish β†’ learning slows down

If weights are too small:

  • It might learn slowly, but at least it learns safely

βœ… Bonus: Xavier and He Initialization​

Modern deep learning uses smarter strategies:

  • Xavier initialization for tanh
  • He initialization for ReLU

These scale weights based on the number of neurons in a layer to maintain stable gradient flow.

πŸ” Slide Visual Recap​

Slide 1: Zero Initialization βŒβ€‹

  • All neurons receive same input β†’ same output β†’ same gradient
  • All weights update identically β†’ no symmetry breaking

Slide 2: Random Initialization βœ…β€‹

  • Weights are random and small
  • Each neuron starts unique
  • Diverse features can be learned
  • Model trains effectively

🧾 TL;DR​

InitializationSymmetry Broken?Risk of Vanishing GradientLearning Outcome
Zeros❌ NoNoAll neurons same
Random Largeβœ… Yesβœ… YesUnstable training
Random Smallβœ… Yes❌ SaferStable training

πŸ“Œ Codex Rule of Thumb:​

> "Don’t let neurons be clones. Randomize to specialize." >