Skip to main content

Activation Functions for Humans: Sigmoid, Tanh, and ReLU (Deep Dive)

πŸ€– What Are Activation Functions?​

Activation functions are what make neural networks non-linear and capable of learning complex patterns. They control whether a neuron "fires" and how strong its output signal is, based on the weighted input.

In simpler terms:

> Think of them like dimmer switches for your brain cells β€” deciding how much light (signal) should come through. >

In technical terms: Given an input value , the activation function transforms it to produce the neuron's output .


πŸ” What Role Do They Play in Training?​

Activation functions are used during forward propagation after computing:

Then:

Where is the activation function.

They allow the network to:

  • Learn complex patterns
  • Avoid collapsing into just a linear regression model
  • Enable gradient descent to work during backpropagation

πŸ§ͺ Refresher: What is Gradient Descent?​

Gradient descent is how neural networks learn. Imagine you’re walking downhill in fog trying to find the lowest point (lowest error).

Steps:

  1. Compute how steep the slope is (gradient)
  2. Take a small step downhill
  3. Repeat

If the slope (gradient) is near-zero, your steps get smaller β€” that’s the vanishing gradient problem.


🧠 Comparing Activation Functions​

1. Sigmoid Function​

  • Output: 0 to 1
  • Looks like an S-curve
  • Used often in the output layer for binary classification

Pros:

  • Gives probability-like outputs
  • Intuitive interpretation

Cons:

  • Not zero-centered β†’ gradients can zigzag
  • Vanishing gradient for very large/small

🧠 Veer Notes: β€œIf we move the weight to the right, the slope increases, but the change is so minimal in extremes that it becomes bad for lots of calculations.” βœ”οΈ


2. Tanh Function (Hyperbolic Tangent)​

  • Output: 1 to 1
  • Also an S-curve
  • Often used in hidden layers

Pros:

  • Zero-centered β†’ faster learning
  • Stronger gradients than sigmoid

Cons:

  • Still suffers from vanishing gradients (less than sigmoid)

🧠 Veer Notes: β€œSince it goes from -1 to 1 we get a more detailed view of success vs. failure. It's useful in hidden layers.” βœ”οΈ


3. ReLU (Rectified Linear Unit)​

  • Output: 0 to ∞
  • Very simple: zero if input is negative, identity if positive

Pros:

  • Fast to compute
  • Doesn’t squash gradients β†’ avoids vanishing
  • Works well in deep networks

Cons:

  • Dying ReLU problem: if too often, neuron outputs 0 forever

🧠 Veer Notes: β€œIt doesn’t respond when results are negative, but if they’re positive, it just keeps going. That’s why it’s strong β€” but it can die.” βœ”οΈ


4. Leaky ReLU​

  • Same as ReLU, but negative inputs get a small slope instead of flat zero

Pros:

  • Fixes dying ReLU
  • Lets gradient flow even for negative

🧠 Veer Notes: β€œSo that’s why you use Leaky ReLU β€” it keeps neurons alive that would otherwise die when they get negative inputs.” βœ”οΈ


πŸ“Š Summary Table​

FunctionOutput RangeZero-CenteredVanishing Gradients?Best Used For
Sigmoid(0, 1)❌ Noβœ… YesOutput layer (binary classification)
Tanh(-1, 1)βœ… Yes⚠️ SometimesHidden layers (classic networks)
ReLU(0, ∞)❌ No❌ RarelyHidden layers (modern deep nets)
Leaky ReLU(~βˆ’βˆž, ∞)❌ No❌ NoDeeper nets with risk of dying ReLUs

🏁 Final Analogy (Veer-Style 🧒)​

  • Sigmoid = Like deciding how much you agree (0 = no, 1 = yes)
  • Tanh = Like saying how strongly you agree or disagree (-1 = no way, +1 = absolutely)
  • ReLU = Like only listening to good news. Bad input? Silent. Good input? Amplify it.
  • Leaky ReLU = Like having a tiny backup mic for when the good news guy gets too quiet.

πŸ’¬ Your Takeaway​

> All activation functions help train the model by shaping how information flows. > > > Sigmoid is great for binary answers. Tanh adds balance for inner reasoning. > > ReLU says β€œyes loudly” or β€œnothing at all,” and Leaky ReLU is its backup plan. > > The better the activation, the more efficient your learning becomes β€” just like switching from a spoon to a shovel when digging your ideas deeper. >