System-design
How does SGLang work?
In this blog, we will learn about how SGLang works. We will also see what problem it solves, how it makes serving large language models faster, and the clever ideas that make it special.
How does Approximate Nearest Neighbor (ANN) search work?
In this blog, we will learn about Approximate Nearest Neighbor (ANN) Search, the idea that lets apps find "similar" things in a huge collection in the blink of an eye. It powers search engines, recommendation systems, face matching, and the memory behind modern AI chatbots. We will also see why the naive approach fails, how trees, hashing, clustering, and graphs make the search fast, and where ANN search is used in the real world.
How does a Google TPU work?
In this blog, we will learn about how a Google TPU works. We will also see what a TPU is, why Google built it, how it is different from a CPU and a GPU, and how it makes machine learning fast.
How do Image Embeddings work?
In this blog, we will learn about how image embeddings work. We will also see why we need image embeddings, how a computer turns a picture into numbers, how we measure the similarity between two of them, and where they are used in the real world.
How does an Embedding Cache work?
In this blog, we will learn about how an Embedding Cache works. We will also see what an embedding is, why an Embedding Cache saves us a lot of money and time, how the cache key is built, and where it is used in real systems like RAG and semantic search.
How does vLLM work?
In this blog, we will learn about how vLLM works. We will also see why we need it, how it manages memory so cleverly, and where it is used in the real world to serve large language models to many users at once.
How does GGUF work?
In this blog, we will learn about how GGUF works. We will also see what problem it solves, what is stored inside a GGUF file, how quantization makes big models fit on a normal laptop, and where it is used in real tools.
How does Token Streaming work?
In this blog, we will learn about how Token Streaming works. We will also see why we need it, how the server and the browser talk to each other to make it happen, and where it is used in real systems like ChatGPT and Claude.
How does Prompt Caching work?
In this blog, we will learn about how Prompt Caching works. We will also see why we need it, how it actually works inside a large language model, and where it is used in real systems like AI assistants and agents.
How does a Reranker work?
In this blog, we will learn about how a Reranker works. We will also see where it sits in a search and RAG pipeline, why we need it, and how it makes our answers more accurate.
How does a Vector Database work?
In this blog, we will learn about how a Vector Database works. This is one of the most important pieces behind modern AI search, recommendations, and tools like ChatGPT that answer questions from our own documents.
Android Push Notification Flow using FCM
In this blog, we will learn about the Android Push Notification Flow using FCM.
Android System Design Interviews
Today, we are going to discuss everything about the Android System Design Interviews.
Write-Ahead Logging (WAL)
In this blog, we will learn about Write-Ahead Logging (WAL) and why it is used internally in databases.
Composite Index in Database
In this blog, we will learn about the Composite Index in Database and why it offers better performance. We will also explore the impact of column order in a composite index.
Database Normalization vs Denormalization
In this blog, we will learn about database normalization and denormalization.
Evolution of HTTP
In this blog, we will learn about the evolution of HTTP.
Internals of RESP - Redis Serialization Protocol
In this blog, we are going to learn about the internals of the Redis Serialization Protocol(RESP).
HTTP Request vs HTTP Long-Polling vs WebSocket vs Server-Sent Events
In this blog, we are going to learn the HTTP Request vs Http Long-Polling vs WebSocket vs Server-Sent Events(SSE).
What is System Design?
In this blog, we will learn what is System Design.
How do Voice And Video Call Work?
This blog is all about how voice and video call works on a high level.