All Blogs

Decoding DeepSeek-V4

Decoding DeepSeek-V4

In this blog, we will learn about DeepSeek-V4, the new family of open Mixture-of-Experts language models that natively supports a one-million-token context with dramatically lower inference cost.

LoRA - Low-Rank Adaptation of LLMs

LoRA - Low-Rank Adaptation of LLMs

In this blog, we will learn about LoRA - Low-Rank Adaptation of Large Language Models.

Math Behind RoPE (Rotary Position Embedding)

Math Behind RoPE (Rotary Position Embedding)

In this blog, we will learn about the math behind Rotary Position Embedding (RoPE) and why it is used in modern Large Language Models.

Grouped Query Attention

Grouped Query Attention

In this blog, we will learn about Grouped-Query Attention (GQA) and how it differs from Multi-Head Attention (MHA).

Math Behind Cross-Entropy Loss

Math Behind Cross-Entropy Loss

In this blog, we will learn about the math behind Cross-Entropy Loss with a step-by-step numeric example.

Math Behind Gradient Descent

Math Behind Gradient Descent

In this blog, we will learn about the math behind gradient descent with a step-by-step numeric example.

Decoding Vision Transformer (ViT)

Decoding Vision Transformer (ViT)

In this blog, we will learn about the Vision Transformer (ViT) by decoding how it splits an image into patches, turns those patches into tokens, and processes them with a transformer to classify the image.

Feed-Forward Networks in LLMs

Feed-Forward Networks in LLMs

In this blog, we will learn about Feed-Forward Networks in LLMs - understanding what they are, how they work inside the Transformer architecture, why every Transformer layer needs one, and what role they play in making Large Language Models so powerful.

Decoding Flash Attention in LLMs

Decoding Flash Attention in LLMs

In this blog, we will learn about Flash Attention by decoding it piece by piece - understanding why standard attention is slow, what makes Flash Attention fast, how it uses GPU memory cleverly, and why it is used in almost every modern Large Language Model (LLM).