Math

April 25, 2026

RMSNorm (Root Mean Square Layer Normalization)

In this blog, we will learn about RMSNorm, a faster and simpler alternative to Layer Normalization that powers most modern Large Language Models like Llama, Mistral, Gemma, Qwen, PaLM, and DeepSeek.

April 23, 2026

Math Behind RoPE (Rotary Position Embedding)

In this blog, we will learn about the math behind Rotary Position Embedding (RoPE) and why it is used in modern Large Language Models.

April 20, 2026

Math Behind Cross-Entropy Loss

In this blog, we will learn about the math behind Cross-Entropy Loss with a step-by-step numeric example.

April 17, 2026

Math Behind Gradient Descent

In this blog, we will learn about the math behind gradient descent with a step-by-step numeric example.

April 6, 2026

Math Behind Backpropagation

In this blog, we will learn about the math behind backpropagation in neural networks.

April 5, 2026

Math behind √dₖ Scaling Factor in Attention

In this blog, we will learn about why we scale the dot product attention by √dₖ in the Transformer architecture with a step-by-step numeric example.

April 3, 2026

Math behind Attention - Q, K, and V

In this blog, we will learn about the math behind Attention - Query(Q), Key(K), and Value(V) with a step-by-step numeric example.