- Published on
Math Behind Gradient Descent
In this blog, we will learn about the math behind gradient descent with a step-by-step numeric example.
Software engineers like you join Outcome School to achieve the outcome that is a high-paying tech job.
A program to help developers get a high-paying job through live classes, mentorship, project-based learning, and technical interview preparation.
Live classes for 6 months. Simple explanations for complex topics.
You can choose from "Android" or "AI and Machine Learning".
As this program is completely online, you can join it from anywhere in the world.
To get a high-paying tech job, you must know the internals and be good at system design. Knowledge comes to those who crave for it.
Working professionals - anyone already working in tech but looking for a high-paying tech job by learning the internals and System Design.
You can choose anyone from the below.
6 months long online learning program to master AI and ML by building real work-like project to gain work experience and get a high-paying job in AI.
View Now6 months long online learning program to master Android Development by building real work-like project to gain work experience and get a high-paying job in Android.
View NowFounder @ Outcome School • IIT 2010-14 • I have taught and mentored many developers, and their efforts landed them high-paying tech jobs, helped many tech companies in solving their unique problems, and created many open-source libraries being used by top companies. I am passionate about sharing knowledge through open-source, blogs, and videos.
CTC Change: 4 LPA → 24 LPA, 9 LPA → 24 LPA, 13 LPA → 46 LPA, 20 LPA → 60 LPA
Software Engineer → VP of Engineering, Software Engineer → Staff Engineer
Worked on Open Source projects and received interview calls from Top Companies
Our students got placed in top companies thanks to their efforts.
I am immensely grateful to Amit Shekhar for his invaluable guidance and support in my journey as an Android developer. His clear and practical approach to teaching complex concepts made learning enjoyable and effective. With Amit's help, I am able to secure my dream job, and I continue to rely on his teachings as I advance in my career. I highly recommend Amit Shekhar to anyone looking to improve their development skills and achieve their professional goals. Thank you, Amit, for your support and for being an inspiring mentor and true leader!
Amit has been an incredible mentor to me. Under his guidance, I navigated the world of open source, which took my journey to the next level. Amit's presence works as a catalyst in the journey of learning and growing. His insights were incredibly helpful, whether it was cracking firms like Microsoft and Blinkit, negotiating salaries, or making career decisions. His mentorship also enabled me to create major open-source libraries. I am grateful to have Amit as a lifelong mentor and look forward to creating a positive impact with him.
I'm extremely grateful to Amit Shekhar for his guidance and support whenever I needed it. His teaching style made me feel very confident, as he has a unique ability to simplify complex topics. Amit's approach to teaching is what sets him apart. He also assisted me with mock interviews and preparation strategies for various companies. With his help, I successfully cracked interviews at multiple companies and currently hold a position as a Fullstack Engineer at Boeing. I highly recommend his program if you're looking to master system design.
We publish high-quality blogs regularly for our learners.
In this blog, we will learn about the math behind gradient descent with a step-by-step numeric example.
In this blog, we will learn about the Vision Transformer (ViT) by decoding how it splits an image into patches, turns those patches into tokens, and processes them with a transformer to classify the image.
In this blog, we will learn about Feed-Forward Networks in LLMs - understanding what they are, how they work inside the Transformer architecture, why every Transformer layer needs one, and what role they play in making Large Language Models so powerful.
In this blog, we will learn about Flash Attention by decoding it piece by piece - understanding why standard attention is slow, what makes Flash Attention fast, how it uses GPU memory cleverly, and why it is used in almost every modern Large Language Model (LLM).
In this blog, we will learn about the Mixture of Experts (MoE) architecture - understanding what experts are, how the router picks them, why MoE makes large models faster and cheaper, and why it powers many of today''s most powerful Large Language Models (LLMs).
In this blog, we will learn about the Transformer architecture by decoding it piece by piece - understanding what each component does, how they work together, and why this architecture powers every modern Large Language Model (LLM)