All Blogs

Decoding Sakana Fugu Technical Report

Decoding Sakana Fugu Technical Report

In this blog, we are going to learn about Sakana Fugu, a family of AI models that work like a conductor for a team of other AI models.

How do Diffusion Language Models (DLMs) work?

How do Diffusion Language Models (DLMs) work?

In this blog, we will learn about Diffusion Language Models (DLMs), a new way to make models write text. They promise to generate words in a different way than the LLMs we use today, and that too much faster in many cases.

How does an Embedding Cache work?

How does an Embedding Cache work?

In this blog, we will learn about how an Embedding Cache works. We will also see what an embedding is, why an Embedding Cache saves us a lot of money and time, how the cache key is built, and where it is used in real systems like RAG and semantic search.

How do World Models work?

How do World Models work?

In this blog, we will learn about how World Models work. We will also see why we need them, how they actually learn an internal picture of an environment, and where they are used in real systems like robotics, game-playing, and video generation.

How does vLLM work?

How does vLLM work?

In this blog, we will learn about how vLLM works. We will also see why we need it, how it manages memory so cleverly, and where it is used in the real world to serve large language models to many users at once.

LLM Inference Optimization

LLM Inference Optimization

Techniques like KV Cache, Paged Attention, Flash Attention, Speculative Decoding, Continuous Batching, and Prompt Caching are what make LLMs fast and scalable in production.

How does Function Calling work in LLMs?

How does Function Calling work in LLMs?

In this blog, we will learn about how Function Calling works in LLMs. We will see what it is, why we need it, the key insight behind it, and how it powers AI agents and assistants step by step.

How does GGUF work?

How does GGUF work?

In this blog, we will learn about how GGUF works. We will also see what problem it solves, what is stored inside a GGUF file, how quantization makes big models fit on a normal laptop, and where it is used in real tools.

How does Knowledge Distillation work?

How does Knowledge Distillation work?

In this blog, we will learn about how Knowledge Distillation works. We will also see why we need it, how a small model learns from a big model, and how this lets us run powerful AI on a phone, on an edge device, and at low cost.