<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Asmit Tyagi - Engineering Notes]]></title><description><![CDATA[Asmit Tyagi - Engineering Notes]]></description><link>https://blog.asmittyagi.com</link><generator>RSS for Node</generator><lastBuildDate>Wed, 15 Apr 2026 22:31:19 GMT</lastBuildDate><atom:link href="https://blog.asmittyagi.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Understanding Stack Data Structure in JavaScript (With Real Examples)]]></title><description><![CDATA[What is a Stack?

A stack is a data structure where the last thing you put in is the first thing you take out.

Key Rule:
LIFO (Last In, First Out)
Real-Life Examples of Stack
Undo / Redo in a Text Editor

We type A B C in text editor, A, B , C all a...]]></description><link>https://blog.asmittyagi.com/understanding-stack-data-structure-in-javascript-with-real-examples</link><guid isPermaLink="true">https://blog.asmittyagi.com/understanding-stack-data-structure-in-javascript-with-real-examples</guid><category><![CDATA[stack]]></category><category><![CDATA[#StackDataStructure]]></category><category><![CDATA[Stacks]]></category><category><![CDATA[data structures]]></category><dc:creator><![CDATA[Asmit Tyagi]]></dc:creator><pubDate>Thu, 22 Jan 2026 11:21:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769069601600/6a100257-a2f1-436a-9c57-3cadd0fda0f4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-what-is-a-stack">What is a Stack?</h1>
<blockquote>
<p>A stack is a data structure where the last thing you put in is the first thing you take out.</p>
</blockquote>
<h2 id="heading-key-rule">Key Rule:</h2>
<p><strong>LIFO (Last In, First Out)</strong></p>
<h2 id="heading-real-life-examples-of-stack"><strong>Real-Life Examples of Stack</strong></h2>
<h3 id="heading-undo-redo-in-a-text-editor"><strong>Undo / Redo in a Text Editor</strong></h3>
<blockquote>
<p>We type A B C in text editor, A, B , C all are pushed in a stack 1 ( we have to stacks one for undo one for redo, whatever we type goes in stack 1 and initially stack 2 is empty.</p>
<p>Then we press ctrl + Z, Now C is gone, where ? in stack 2 so that if we need to do Redo we should be able to take that value from stack 2 and put back in stack 1.</p>
<p>Makes sense?</p>
</blockquote>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769070439143/2e90dcaf-d613-4af7-a38c-c039d300d9ac.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-a-stack-is-ultimately-a-list-but-with-strict-access-rules">A stack is <strong>ultimately a list</strong>, but with <strong>strict access rules</strong>.</h1>
<blockquote>
<p>Rules:</p>
<ul>
<li><p>Elements can be added only from one end (top)</p>
</li>
<li><p>Elements can be removed only from that same end</p>
</li>
<li><p>No access to middle or bottom elements</p>
</li>
</ul>
</blockquote>
<h2 id="heading-defining-a-stack-in-javascript"><strong>Defining a Stack in JavaScript</strong></h2>
<blockquote>
<p>To create a stack, we first need a <strong>container</strong> that will hold our data.</p>
<p>In JavaScript, the simplest and most efficient choice is an <strong>array</strong>.</p>
</blockquote>
<p>So we start by defining a Stack class.</p>
<pre><code class="lang-javascript"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Stack</span></span>{
  <span class="hljs-keyword">constructor</span>(){
    <span class="hljs-built_in">this</span>.stack = []
  }
}
</code></pre>
<h3 id="heading-what-is-happening-here"><strong>What is happening here?</strong></h3>
<ul>
<li><p>We create a class called Stack</p>
</li>
<li><p>Inside the constructor, we initialize an empty array</p>
</li>
<li><p>This array will store all stack elements</p>
</li>
<li><p>The <strong>end of the array represents the top of the stack</strong></p>
</li>
</ul>
<blockquote>
<p>At this point, we have a stack structure, but it can’t do anything yet.</p>
</blockquote>
<h2 id="heading-adding-data-to-the-stack-push"><strong>Adding Data to the Stack (push)</strong></h2>
<p>To add elements to a stack, we use the push operation.</p>
<pre><code class="lang-javascript"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Stack</span></span>{
  <span class="hljs-keyword">constructor</span>(){
    <span class="hljs-built_in">this</span>.stack = []
  }

  <span class="hljs-comment">// Stack only grows from one end. This is how new data enters.</span>
  push(data){
    <span class="hljs-built_in">this</span>.stack.push(data)
  }
<span class="hljs-comment">// Mental Model: I am doing something new - save it for possible undo later.</span>
}
</code></pre>
<h3 id="heading-what-this-does"><strong>What this does:</strong></h3>
<ul>
<li><p>Takes data as input</p>
</li>
<li><p>Adds it to the <strong>top of the stack</strong></p>
</li>
<li><p>Internally, Array.push() adds the element to the end of the array</p>
</li>
</ul>
<p>This follows stack rules because:</p>
<ul>
<li><p>We are adding elements only from one end</p>
</li>
<li><p>No middle or bottom insertion is allowed</p>
</li>
</ul>
<h2 id="heading-removing-data-from-the-stack-pop"><strong>Removing Data from the Stack (pop)</strong></h2>
<p>To remove the most recent element, we use pop.</p>
<pre><code class="lang-javascript"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Stack</span></span>{
  <span class="hljs-keyword">constructor</span>(){
    <span class="hljs-built_in">this</span>.stack = []
  }

  push(data){
    <span class="hljs-built_in">this</span>.stack.push(data)
  }

  pop(){
    <span class="hljs-built_in">this</span>.stack.pop()
  }
<span class="hljs-comment">// Mental Trigger: Take back the most recent thing.</span>
}
</code></pre>
<h3 id="heading-what-this-does-1"><strong>What this does:</strong></h3>
<ul>
<li><p>Removes the <strong>top element</strong> of the stack</p>
</li>
<li><p>Follows LIFO (Last In, First Out)</p>
</li>
<li><p>Uses JavaScript’s Array.pop() internally</p>
</li>
</ul>
<p>This is how undo, backtracking, and function calls work.</p>
<h2 id="heading-viewing-the-top-element-peek"><strong>Viewing the Top Element (peek)</strong></h2>
<p>Sometimes we only want to <strong>see</strong> the top element without removing it.</p>
<pre><code class="lang-javascript"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Stack</span></span>{
  <span class="hljs-keyword">constructor</span>(){
    <span class="hljs-built_in">this</span>.stack = []
  }

  push(data){
    <span class="hljs-built_in">this</span>.stack.push(data)
  }

  pop(){
    <span class="hljs-built_in">this</span>.stack.pop()
  }

<span class="hljs-comment">// Why it exist : something we neede to look before we act.</span>
  peek(){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack[<span class="hljs-built_in">this</span>.stack.length - <span class="hljs-number">1</span>]
  }
}
</code></pre>
<h3 id="heading-what-this-does-2"><strong>What this does:</strong></h3>
<ul>
<li><p>Accesses the last element of the array</p>
</li>
<li><p>Returns it without modifying the stack</p>
</li>
</ul>
<p>This is useful when:</p>
<ul>
<li><p>You want to know what will be popped next</p>
</li>
<li><p>You need to validate something before removing it</p>
</li>
</ul>
<h2 id="heading-checking-if-the-stack-is-empty-isempty"><strong>Checking if the Stack Is Empty (isEmpty)</strong></h2>
<p>Before popping, it’s important to know whether the stack has elements or not.</p>
<pre><code class="lang-javascript"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Stack</span></span>{
  <span class="hljs-keyword">constructor</span>(){
    <span class="hljs-built_in">this</span>.stack = []
  }

  push(data){
    <span class="hljs-built_in">this</span>.stack.push(data)
  }

  pop(){
    <span class="hljs-built_in">this</span>.stack.pop()
  }

  peek(){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack[<span class="hljs-built_in">this</span>.stack.length - <span class="hljs-number">1</span>]
  }

<span class="hljs-comment">// why it exits : popping from an empty stack = bug/ crash/ undefiend behaviour</span>
  isEmpty(){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack.length === <span class="hljs-number">0</span>
  }
}
</code></pre>
<h3 id="heading-what-this-does-3"><strong>What this does:</strong></h3>
<ul>
<li><p>Returns true if the stack has no elements</p>
</li>
<li><p>Returns false otherwise</p>
</li>
</ul>
<p>This prevents:</p>
<ul>
<li><p>Errors</p>
</li>
<li><p>Unexpected behavior</p>
</li>
<li><p>Crashes from popping an empty stack</p>
</li>
</ul>
<h2 id="heading-stack-utility-methods-size-clear-contains-and-reverse"><strong>Stack Utility Methods: size, clear, contains, and reverse</strong></h2>
<p>After implementing the core stack operations, we usually need a few <strong>helper methods</strong> to make the stack easier to work with.</p>
<p>These methods don’t change how a stack behaves, but they help us <strong>inspect, reset, or validate</strong> the stack.</p>
<p>Below is the relevant part of the stack implementation:</p>
<pre><code class="lang-javascript"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Stack</span></span>{
  <span class="hljs-keyword">constructor</span>(){
    <span class="hljs-built_in">this</span>.stack = []
  }

  push(data){
    <span class="hljs-built_in">this</span>.stack.push(data)
  }

  pop(){
    <span class="hljs-built_in">this</span>.stack.pop()
  }

  peek(){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack[<span class="hljs-built_in">this</span>.stack.length - <span class="hljs-number">1</span>]
  }

  isEmpty(){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack.length === <span class="hljs-number">0</span>
  }

<span class="hljs-comment">// Returns the number of elements in the stack</span>
  size(){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack.length
  }

<span class="hljs-comment">// Clears the entire stack: Yes its that simple </span>
  clear(){
    <span class="hljs-built_in">this</span>.stack = []
  }

<span class="hljs-comment">// Checks if a value exists anywhere in the stack</span>
  contains(element){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack.includes(element)
  }

<span class="hljs-comment">// Optional: It reverses </span>
  reverse(){
    <span class="hljs-built_in">this</span>.stack.reverse()
  }

}
</code></pre>
<hr />
<h3 id="heading-size"><strong>size()</strong></h3>
<p>The size() method returns the total number of elements currently present in the stack.</p>
<p>It simply returns the length of the underlying array and does <strong>not</strong> modify the stack in any way.</p>
<p>This is useful for debugging, validations, or when you need to know how full the stack is.</p>
<hr />
<h3 id="heading-clear"><strong>clear()</strong></h3>
<p>The clear() method removes all elements from the stack.</p>
<p>Instead of popping elements one by one, we just assign a <strong>new empty array</strong>.</p>
<p>Sometimes the easiest solution really is to start fresh.</p>
<p>This is commonly used when:</p>
<ul>
<li><p>Resetting application state</p>
</li>
<li><p>Clearing undo history</p>
</li>
</ul>
<p>Reinitializing the stack</p>
<hr />
<h3 id="heading-containselement"><strong>contains(element)</strong></h3>
<p>The contains() method checks whether a given element exists anywhere in the stack.</p>
<p>While this is useful, it’s worth noting that this method <strong>breaks pure stack abstraction</strong>, since a stack is supposed to expose only the top element.</p>
<p>That said, it’s perfectly fine as a <strong>utility method</strong> for learning, debugging, or validation.</p>
<hr />
<h2 id="heading-using-the-stack-final-example"><strong>Using the Stack (Final Example)</strong></h2>
<pre><code class="lang-javascript"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Stack</span></span>{
  <span class="hljs-keyword">constructor</span>(){
    <span class="hljs-built_in">this</span>.stack = []
  }

  <span class="hljs-comment">// To add data in stack</span>
  push(data){
    <span class="hljs-built_in">this</span>.stack.push(data)
  }

  pop(){
    <span class="hljs-built_in">this</span>.stack.pop()
  }

  peek(){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack[<span class="hljs-built_in">this</span>.stack.length - <span class="hljs-number">1</span>]
  }

  isEmpty(){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack.length === <span class="hljs-number">0</span>
  }

  size(){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack.length
  }

  clear(){
    <span class="hljs-built_in">this</span>.stack = []
  }

  contains(element){
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.stack.includes(element)
  }

  reverse(){
    <span class="hljs-built_in">this</span>.stack.reverse()
  }

  printStack(){
    <span class="hljs-keyword">let</span> str = <span class="hljs-string">""</span>
    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">let</span> i = <span class="hljs-number">0</span>; i &lt; <span class="hljs-built_in">this</span>.stack.length; i++) {
      str += <span class="hljs-built_in">this</span>.stack[i] + <span class="hljs-string">"\n"</span>
    }
    <span class="hljs-keyword">return</span> str
  }
}

<span class="hljs-keyword">const</span> myStack = <span class="hljs-keyword">new</span> Stack()

myStack.push(<span class="hljs-number">8</span>)
myStack.push(<span class="hljs-number">3</span>)
myStack.push(<span class="hljs-number">4</span>)
myStack.push(<span class="hljs-number">3</span>)

<span class="hljs-built_in">console</span>.log(<span class="hljs-string">"This is the Element at Top: "</span>.myStack.peek())
<span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Now printing the stack values: "</span>)
<span class="hljs-built_in">console</span>.log(myStack.printStack())
</code></pre>
<pre><code class="lang-javascript">asmit~$node stack/index.js
This is the Element at Top:  <span class="hljs-number">3</span>
Now printing the stack values: 
<span class="hljs-number">8</span>
<span class="hljs-number">3</span>
<span class="hljs-number">4</span>
<span class="hljs-number">3</span>
</code></pre>
<h2 id="heading-this-shows">This shows:</h2>
<ul>
<li><p>peek() returns the most recently added element</p>
</li>
<li><p>printStack() displays the stack from bottom to top</p>
</li>
<li><p>Stack behavior follows <strong>Last In, First Out</strong></p>
</li>
</ul>
<hr />
<h2 id="heading-final-thoughts"><strong>Final Thoughts</strong></h2>
<p>A stack is simple in structure but extremely powerful in practice.</p>
<p>It is used in:</p>
<ul>
<li><p>Undo / Redo systems</p>
</li>
<li><p>Function call handling</p>
</li>
<li><p>Expression evaluation</p>
</li>
<li><p>Backtracking problems</p>
</li>
</ul>
<p>Once you understand stacks clearly, learning <strong>Queue</strong>, <strong>Linked List</strong>, and <strong>Recursion</strong> becomes much easier.</p>
<p>This implementation is intentionally kept simple to focus on <strong>understanding</strong>, not overengineering.</p>
<hr />
<h3 id="heading-whats-next"><strong>What’s Next?</strong></h3>
<p>In the next article, we’ll look at the Queue data structure and see how changing just one rule completely changes behavior.</p>
<hr />
<h3 id="heading-key-takeaway"><strong>Key Takeaway</strong></h3>
<blockquote>
<p>A stack is just a list with discipline.</p>
</blockquote>
]]></content:encoded></item><item><title><![CDATA[How ChatGPT (AI) Understands You 
(Almost Like a Human)]]></title><description><![CDATA[Introduction
When you type something into ChatGPT, it feels like you’re talking to a smart friend who magically “gets” English, Hindi, Hinglish, emojis, sarcasm - just everything.
But here’s the twist:
AI doesn’t understand English.
Not even a little...]]></description><link>https://blog.asmittyagi.com/how-chatgpt-works</link><guid isPermaLink="true">https://blog.asmittyagi.com/how-chatgpt-works</guid><category><![CDATA[AI]]></category><category><![CDATA[ChaiCode]]></category><category><![CDATA[Chaiaurcode]]></category><category><![CDATA[genai]]></category><dc:creator><![CDATA[Asmit Tyagi]]></dc:creator><pubDate>Thu, 20 Nov 2025 11:59:31 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763635964652/141451f6-a3fe-44d2-8fcb-dab42b1c6d6b.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>When you type something into ChatGPT, it feels like you’re talking to a smart friend who magically “gets” English, Hindi, Hinglish, emojis, sarcasm - just everything.</p>
<p>But here’s the twist:</p>
<p><strong>AI doesn’t understand English.</strong></p>
<p><strong>Not even a little.</strong></p>
<p>And that’s where things get interesting.</p>
<p>We speak in language.</p>
<p>AI speaks in <strong>numbers</strong>.</p>
<p>So every conversation sits on top of a giant translation layer that quietly works behind the scenes, turning your words into math and math back into words - all in milliseconds.</p>
<p>Before we get into the heavy-duty AI machinery, let’s slow down and understand the basics.</p>
<hr />
<h2 id="heading-language-meaning-how-humans-do-it"><strong>Language → Meaning: How Humans Do It</strong></h2>
<p>Imagine someone picks up a Hindi-to-English dictionary and tries to translate:</p>
<p><strong>“Kaise ho aap?” → “How are you?”</strong></p>
<p>Even without a dictionary, your brain knows the meaning instantly.</p>
<p>You don’t spell out K-A-I-S-E.</p>
<p>You don’t break it into syllables.</p>
<p>Your brain jumps straight to <strong>meaning</strong> - a feeling, an understanding, a memory.</p>
<p>When you hear <strong>“chai”</strong>, you don’t see “C-H-A-I”.</p>
<p>You sense warmth, aroma, comfort, maybe even a rainy evening.</p>
<p>This is how humans process language:</p>
<ul>
<li><p>We hear words</p>
</li>
<li><p>We convert them to meaning</p>
</li>
<li><p>Meaning triggers a mental pattern</p>
</li>
</ul>
<p>AI tries to do something similar - but with math instead of neurons.</p>
<hr />
<h2 id="heading-step-1-tokenization-breaking-words-into-pieces"><strong>Step 1: Tokenization → Breaking Words Into Pieces</strong></h2>
<p>Before AI can understand anything, it needs to chop your text into tiny units called <strong>tokens</strong>.</p>
<p>The sentence:</p>
<p>“How are you doing today?”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763636589742/11e1c177-f5bc-42eb-8e92-2383d934472d.png" alt class="image--center mx-auto" /></p>
<p>might become something like:</p>
<p>[“How”, “are”, “you”, “doing”, “today”]</p>
<p>Think of <strong>tokenization</strong> as the model’s way of saying:</p>
<p><strong>“Let me break this sentence into pieces that I can turn into numbers -</strong> the form the model can actually understand.<strong>”</strong></p>
<hr />
<p>For example: Here’s how some of those tokens look:</p>
<ul>
<li><p><strong>“How” → 5299</strong></p>
</li>
<li><p><strong>“are” → 553</strong></p>
</li>
<li><p><strong>“you” → 481</strong></p>
</li>
<li><p>“<strong>doing</strong>” <strong>→</strong> <strong>5306</strong></p>
</li>
<li><p>“<strong>today</strong>” <strong>→ 4044</strong></p>
</li>
</ul>
<hr />
<h3 id="heading-under-the-hood-the-tokenizer-code">Under the Hood: The Tokenizer Code</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763636677158/f4fc53ed-467b-4623-bb5e-db4bfa72609f.png" alt /></p>
<h3 id="heading-the-actual-tokens"><strong>The Actual Tokens</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763636702204/99ea4656-49c6-43e0-98f0-702d2e0a67ae.png" alt /></p>
<blockquote>
<p>These IDs now move into the next step: embeddings → where actual meaning gets constructed.</p>
</blockquote>
<hr />
<hr />
<h2 id="heading-step-2-embeddings-turning-tokens-into-meaning"><strong>Step 2: Embeddings → Turning Tokens Into Meaning</strong></h2>
<p>After tokenization, all we have is a list of token IDs:</p>
<p><code>[5299, 553, 481, 5306, 4044]</code></p>
<p>Useful?</p>
<p>Not really.</p>
<p>Token IDs are just <strong>labels</strong> - they carry <strong>zero meaning</strong>.</p>
<p>The model can’t understand anything from them.</p>
<p>This is where embeddings step in.</p>
<h3 id="heading-what-embeddings-actually-do"><strong>What Embeddings Actually Do</strong></h3>
<p>Embeddings convert each token into a <strong>vector</strong> - a list of hundreds or thousands of numbers that represent the <em>meaning</em> of that word.</p>
<p>Example (conceptual):</p>
<pre><code class="lang-python"><span class="hljs-string">"chai"</span> → [<span class="hljs-number">-0.12</span>, <span class="hljs-number">0.58</span>, <span class="hljs-number">1.29</span>, <span class="hljs-number">-0.44</span>, ...]
<span class="hljs-string">"tea"</span>  → [<span class="hljs-number">-0.10</span>, <span class="hljs-number">0.61</span>, <span class="hljs-number">1.33</span>, <span class="hljs-number">-0.40</span>, ...]
</code></pre>
<p>Look at those two vectors… almost similar, right?</p>
<p>That’s the idea.</p>
<p>Words with similar meaning live <strong>close together</strong> in this mathematical space.</p>
<p>It’s like a giant map where:</p>
<ul>
<li><p>“Kitten” is near “cat”</p>
</li>
<li><p>“dog” is near “wolf”</p>
</li>
<li><p>“Apple” is closer to “banana” than to “cat”</p>
</li>
</ul>
<p><strong>Embeddings = meaning.</strong></p>
<p>Here’s a visual that shows exactly how tokens cluster in vector space:</p>
<p><img src="https://weaviate.io/assets/images/vector-search-c9852b39f62abb6122b2123e6d5f7ed5.jpg" alt="A Gentle Introduction to Vector Databases | Weaviate" /></p>
<p>Words that share meaning appear close together in vector space.</p>
<p>When you hear the word <em>“chai”</em>:</p>
<p>You don’t think:</p>
<blockquote>
<p>“C-H-A-I”</p>
</blockquote>
<p>Your brain fires a <strong>pattern</strong> - a memory of taste, smell, warmth, maybe Baarish (Rain) vibes.</p>
<p>Similarly, AI stores meaning as a <strong>pattern of numbers</strong>.</p>
<p>Different system, same idea.</p>
<p>This is why embeddings are often described as the model’s “memory space.”</p>
<hr />
<h3 id="heading-tiny-code-example-getting-an-embedding"><strong>Tiny Code Example: Getting an Embedding</strong></h3>
<p>Here’s a small snippet that fetches the embedding vector for the word <strong>“chai”</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763646517458/7acfcd3d-2002-47f1-91af-6ef87876a4c6.png" alt class="image--center mx-auto" /></p>
<p><strong>What you’ll see:</strong></p>
<ul>
<li><p>The vector will be around <strong>1536 dimensions</strong></p>
</li>
<li><p>And the first few numbers will look random - but they encode meaning</p>
</li>
</ul>
<hr />
<h3 id="heading-preview-of-embedding-output"><strong>Preview of Embedding Output</strong></h3>
<p>Embedding length: 1536</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763646494317/4d9c83cb-e175-49bd-a135-6c081916e465.png" alt class="image--center mx-auto" /></p>
<p>This long list of numbers <strong>is how the model understands your text</strong>.</p>
<p>Not as words.</p>
<p>Not as grammar.</p>
<p>But as pure meaning patterns.</p>
<h3 id="heading-why-this-matters"><strong>Why This Matters</strong></h3>
<p>Now the model has everything it needs to actually <em>think</em>:</p>
<ul>
<li><p>It knows what each word “means.”</p>
</li>
<li><p>It knows which words relate to each other.</p>
</li>
<li><p>It knows how words cluster together into concepts.</p>
</li>
</ul>
<p>The next step?</p>
<blockquote>
<p>Now the model knows what our words <em>mean</em> -</p>
<p>but it still doesn’t know the <strong>order</strong> in which we said them.</p>
<p>Because embeddings only capture meaning,</p>
<p>“The cat sat on the mat”</p>
<p>and</p>
<p>“The mat sat on the cat”</p>
<p>use the same words and <strong>would produce the same embeddings</strong>, just arranged differently.</p>
<p>But the model still has <strong>no way</strong> to understand:</p>
<ul>
<li><p>who sat on whom</p>
</li>
<li><p>what happened first</p>
</li>
<li><p>what the sentence <em>actually</em> means</p>
</li>
</ul>
<p>How does the model understand <em>order</em>?</p>
<p>That’s where <strong>Positional Encoding</strong> comes in.</p>
</blockquote>
<hr />
<hr />
<h2 id="heading-step-3-positional-encoding-teaching-the-model-word-order"><strong>Step 3: Positional Encoding → Teaching the Model Word Order</strong></h2>
<p>By now, the model knows:</p>
<ul>
<li><p>what each word <em>means</em> (embeddings)</p>
</li>
<li><p>how words relate in meaning</p>
</li>
</ul>
<p>But there’s still a major problem:</p>
<p><strong>The model has no idea what order the words came in.</strong></p>
<p>Embeddings capture meaning…</p>
<p>but <strong>not sequence</strong>.</p>
<h3 id="heading-why-order-matters"><strong>Why Order Matters</strong></h3>
<p>Look at these two sentences:</p>
<p><strong>“The cat sat on the mat.”</strong></p>
<p><strong>“The mat sat on the cat.”</strong></p>
<p>They contain the exact same words.</p>
<p>They would produce <strong>the same embeddings</strong>, just arranged differently.</p>
<p>But the meaning?</p>
<p>100% opposite.</p>
<p>Without knowing which word comes where, the model can’t understand:</p>
<ul>
<li><p>who did the action</p>
</li>
<li><p>what happened first</p>
</li>
<li><p>the actual intent of the sentence</p>
</li>
</ul>
<p>So how do we fix this?</p>
<hr />
<h3 id="heading-positional-encoding-giving-words-a-sense-of-place"><strong>Positional Encoding: Giving Words a Sense of Place</strong></h3>
<p>To teach the model <strong>order</strong>, we add a tiny pattern to every word embedding - something like:</p>
<ul>
<li><p>Word 1 → position pattern A</p>
</li>
<li><p>Word 2 → position pattern B</p>
</li>
<li><p>Word 3 → position pattern C</p>
</li>
</ul>
<p>These patterns are created using a <strong>mathematical function</strong></p>
<p>(don’t worry, we don’t need to touch the formulas - that’s deep ML engineer territory).</p>
<p>This function slightly shifts each embedding so the model can feel:</p>
<ul>
<li><p>“I’m the first word.”</p>
</li>
<li><p>“I’m the second word.”</p>
</li>
<li><p>“I come after ‘cat’ but before ‘mat’.”</p>
</li>
</ul>
<p>All you really need to know:</p>
<blockquote>
<p><strong>Positional encodings inject order into meaning.</strong></p>
</blockquote>
<p>It’s like giving each word a small GPS coordinate, so the model knows <em>where</em> it is in the sentence.</p>
<hr />
<h3 id="heading-why-this-step-is-crucial"><strong>Why This Step Is Crucial</strong></h3>
<p>With positional encoding:</p>
<ul>
<li><p>“cat” knows it comes <em>before</em> “sat”</p>
</li>
<li><p>“sat” knows its subject is “cat”</p>
</li>
<li><p>“mat” knows it’s the location, not the actor</p>
</li>
</ul>
<p>Now the model can actually understand the structure of your sentence.</p>
<p>Meaning + Order = Understanding.</p>
<hr />
<h3 id="heading-the-big-picture"><strong>The Big Picture</strong></h3>
<p>Up to now, your text has gone through:</p>
<ol>
<li><p><strong>Tokenization</strong> → break into pieces</p>
</li>
<li><p><strong>Embeddings</strong> → convert into meaning</p>
</li>
<li><p><strong>Positional Encoding</strong> → understand order</p>
</li>
</ol>
<p>Now the model has everything it needs to read your input properly.</p>
<p>So the next question is:</p>
<blockquote>
<p>Once the model knows <em>what</em> you said and <em>in what order</em>…</p>
<p><strong>how does it decide what to pay attention to?</strong></p>
</blockquote>
<p>That’s where <strong>Self-Attention</strong> comes in - the heart of the <strong>Transformer</strong>.</p>
<hr />
<hr />
<h2 id="heading-step-4-self-attention-how-the-model-figures-out-who-matters"><strong>Step 4: Self-Attention → How the Model Figures Out “Who Matters?”</strong></h2>
<p>Now the model knows two things:</p>
<ol>
<li><p><strong>What</strong> each word <em>means</em> (embeddings)</p>
</li>
<li><p><strong>Where</strong> each word is in the sentence (positional encoding)</p>
</li>
</ol>
<p>But understanding language requires one more skill:</p>
<blockquote>
<p><strong>Knowing which words depend on which.</strong></p>
</blockquote>
<p>Because meaning is not just about the words -</p>
<p>it’s about their <strong>relationships</strong>.</p>
<p>And that’s exactly what <strong>Self-Attention</strong> does.</p>
<h3 id="heading-why-we-need-self-attention"><strong>Why We Need Self-Attention</strong></h3>
<p>Take this sentence:</p>
<p><strong>“He went to the bank.”</strong></p>
<p>Does “bank” mean:</p>
<ul>
<li><p>a place with water (river bank), or</p>
</li>
<li><p>a place with money (ICICI bank)?</p>
</li>
</ul>
<p>The model doesn’t know…</p>
<p>until it looks at the <strong>other words in the sentence</strong>.</p>
<p>This is where the magic happens.</p>
<h3 id="heading-what-self-attention-actually-does"><strong>What Self-Attention Actually Does</strong></h3>
<p>Self-Attention lets <strong>every token talk to every other token</strong> and decide:</p>
<ul>
<li><p>Who is relevant to me?</p>
</li>
<li><p>Whose meaning affects my meaning?</p>
</li>
<li><p>How much should I pay attention to each word?</p>
</li>
</ul>
<p>In Hindi:</p>
<blockquote>
<p><strong>“Yaha har token ko mauka milta hai ki bhai… sentence mein kaun important hai, ek baar check karlo.”</strong></p>
</blockquote>
<hr />
<h3 id="heading-example-that-makes-it-crystal-clear"><strong>Example That Makes It Crystal Clear</strong></h3>
<p><strong>1. “The river bank was flooded.”</strong></p>
<p>“bank” looks around and sees “river” → oh, water → correct meaning.</p>
<p><strong>2. “The ICICI bank was closed.”</strong></p>
<p>“bank” sees “ICICI” → financial → correct meaning.</p>
<p><strong>Same word.</strong></p>
<p><strong>Different meaning.</strong></p>
<p><strong>Context decides.</strong></p>
<p>Self-Attention is the mechanism through which this happens.</p>
<p><strong>Another Example</strong></p>
<p><strong>“A dog is sleeping on a train.”</strong></p>
<p>Here’s how Self-Attention works internally:</p>
<ul>
<li><p>“dog” pays attention to “sleeping” → action it performs</p>
</li>
<li><p>“sleeping” pays attention to “dog” → who is doing it</p>
</li>
<li><p>“train” gives location</p>
</li>
<li><p>“on” links “sleeping” ↔ “train”</p>
</li>
</ul>
<p>This is how the model builds <strong>relationships</strong> between words.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763784052594/a60ed051-8fae-4ce8-ac4f-e68d0a7b2a95.png" alt class="image--center mx-auto" /></p>
<hr />
<h3 id="heading-the-result"><strong>The Result</strong></h3>
<p>After self-attention, each token’s embedding becomes a <strong>context-aware embedding</strong>.</p>
<p>Meaning:</p>
<ul>
<li><p>“bank” now <em>knows</em> if it’s next to a river or a financial institution</p>
</li>
<li><p>“he” knows who “he” refers to</p>
</li>
<li><p>“dog” knows it is the subject</p>
</li>
<li><p>“train” knows it provides location</p>
</li>
</ul>
<p>The model isn’t just reading words -</p>
<p>it’s <strong>understanding relationships</strong>.</p>
<p>Self-attention takes plain word embeddings and turns them into <strong>context-aware</strong> embeddings - tokens that understand not just what they mean, but how they relate to every other word in the sentence.</p>
<p>Now the model has meaning + order + relationships.</p>
<p>But one attention head can only look at the sentence from <strong>one angle</strong>.</p>
<p>To truly understand language, the model needs to think from <strong>multiple perspectives at once</strong>.</p>
<hr />
<hr />
<h2 id="heading-step-5-multi-head-attention-understanding-from-multiple-angles"><strong>Step 5: Multi-Head Attention → Understanding From Multiple Angles</strong></h2>
<p>Self-Attention gives the model one powerful ability:</p>
<blockquote>
<p><strong>Look around the sentence and decide which words matter.</strong></p>
</blockquote>
<p>But language isn’t a one-angle thing.</p>
<p>Sometimes meaning depends on:</p>
<ul>
<li><p><em>who</em> is doing something</p>
</li>
<li><p><em>what</em> action is happening</p>
</li>
<li><p><em>where</em> it’s happening</p>
</li>
<li><p><em>how</em> words are connected</p>
</li>
<li><p><em>what</em> the sentence structure looks like</p>
</li>
<li><p><em>which</em> words indicate time, tense, or sentiment</p>
</li>
</ul>
<p>And <strong>one</strong> attention head can only focus on <strong>one pattern</strong> at a time.</p>
<p>So the Transformer does something genius.</p>
<hr />
<h3 id="heading-what-multi-head-attention-actually-does"><strong>What Multi-Head Attention Actually Does</strong></h3>
<p>Instead of one attention head, the model uses <strong>many heads in parallel</strong>.</p>
<p>Each head looks at the same sentence…</p>
<p>but from its <strong>own unique perspective</strong>.</p>
<p>Examples of what different heads might focus on:</p>
<ul>
<li><p>One head tracks <strong>subject → verb</strong></p>
</li>
<li><p>One head focuses on <strong>location</strong></p>
</li>
<li><p>One head looks for <strong>objects</strong></p>
</li>
<li><p>One focuses on <strong>long-range dependencies</strong> (“because”, “however”, “although”)</p>
</li>
<li><p>One captures <strong>tense or timing</strong></p>
</li>
<li><p>One watches for <strong>who refers to whom</strong> (“he”, “she”, “it”)</p>
</li>
</ul>
<p>Think of it like a group of detectives analyzing the same scene -</p>
<p>each looking for different clues.</p>
<p>Then all heads combine their insights to form a <strong>richer understanding</strong> of the sentence.</p>
<hr />
<h3 id="heading-example"><strong>Example</strong></h3>
<p>Sentence:</p>
<p><strong>“A dog is sleeping on a train.”</strong></p>
<p>Different heads might focus on:</p>
<ul>
<li><p>Head 1 → “dog ↔ sleeping” (who is doing what)</p>
</li>
<li><p>Head 2 → “sleeping ↔ train” (action + location)</p>
</li>
<li><p>Head 3 → “on” (relation)</p>
</li>
<li><p>Head 4 → sentence structure</p>
</li>
<li><p>Head 5 → long-range context</p>
</li>
</ul>
<p>Each head sees something different.</p>
<p>Together, they give the model a <strong>complete picture</strong>.</p>
<hr />
<h3 id="heading-why-multi-head-attention-matters"><strong>Why Multi-Head Attention Matters</strong></h3>
<p>Because language is complicated.</p>
<p>No single viewpoint is enough.</p>
<p>By using many attention heads at once, the Transformer becomes:</p>
<ul>
<li><p>more accurate</p>
</li>
<li><p>more context-aware</p>
</li>
<li><p>better at resolving ambiguity</p>
</li>
<li><p>better at understanding long sentences</p>
</li>
<li><p>better at reasoning</p>
</li>
</ul>
<p>This is why LLMs “feel” intelligent.</p>
<hr />
<p>Now the model has:</p>
<ul>
<li><p><strong>Meaning</strong> (Embeddings)</p>
</li>
<li><p><strong>Order</strong> (Positional Encoding)</p>
</li>
<li><p><strong>Relationships</strong> (Self-Attention)</p>
</li>
<li><p><strong>Multiple Perspectives</strong> (Multi-Head Attention)</p>
</li>
</ul>
<p>But there’s one more critical piece inside a Transformer block:</p>
<blockquote>
<p><strong>A Feed-Forward Neural Network to refine and polish the information.</strong></p>
</blockquote>
<hr />
<hr />
<h2 id="heading-step-6-feed-forward-network-polishing-the-meaning"><strong>Step 6: Feed-Forward Network → Polishing the Meaning</strong></h2>
<p>After multi-head attention does its job, each token now has a rich, context-aware representation.</p>
<p>But Transformers add one more small step to make the understanding even sharper:</p>
<p><strong>A Feed-Forward Neural Network (FFN).</strong></p>
<p><img src="https://media.geeksforgeeks.org/wp-content/uploads/20250722154127824734/what_is_a_feedforward_neural_network_.webp" alt="Feedforward Neural Network - GeeksforGeeks" /></p>
<p>And don’t worry - this is the simplest part of the entire model.</p>
<hr />
<h3 id="heading-what-ffn-really-does"><strong>What FFN Really Does</strong></h3>
<p>It takes the updated token representation…</p>
<p><strong>transforms it a bit using a tiny neural network…</strong></p>
<p>and sends it forward.</p>
<p>That’s literally it.</p>
<p>No loops.</p>
<p>No attention.</p>
<p>No fancy math.</p>
<p>Just a simple “take input → apply a formula → give output.”</p>
<hr />
<h3 id="heading-why-it-exists"><strong>Why It Exists</strong></h3>
<p>Think of the FFN as <strong>a mini brain</strong> inside each Transformer layer.</p>
<p>Attention helps tokens talk to each other.</p>
<p>FFN helps each token <strong>think on its own</strong> - refine itself.</p>
<hr />
<h3 id="heading-simple-analogy"><strong>Simple Analogy</strong></h3>
<p>Attention = <em>“Who matters in this sentence?”</em></p>
<p>FFN = <em>“Ok, now that I know that… let me process it internally.”</em></p>
<p>It’s polish.</p>
<p>Cleanup.</p>
<p>Refinement.</p>
<hr />
<h3 id="heading-the-flow"><strong>The Flow</strong></h3>
<p>For each token:</p>
<ol>
<li><p>Take its vector</p>
</li>
<li><p>Pass it through a small neural network (just two linear layers + activation)</p>
</li>
<li><p>Output a cleaned-up representation</p>
</li>
</ol>
<p>That’s all.</p>
<hr />
<h3 id="heading-why-this-matters-1"><strong>Why This Matters</strong></h3>
<p>Because attention gives context,</p>
<p>but FFN gives structure and clarity.</p>
<p>Together, they form <strong>one Transformer block</strong>.</p>
<hr />
<hr />
<h2 id="heading-step-7-the-full-transformer-pipeline-everything-comes-together"><strong>Step 7: The Full Transformer Pipeline (Everything Comes Together)</strong></h2>
<p>Alright… <strong>deep breath.</strong></p>
<p>So far, we’ve already cracked:</p>
<ul>
<li><p>how text becomes tokens</p>
</li>
<li><p>how tokens become meaning</p>
</li>
<li><p>how we give words order</p>
</li>
<li><p>how tokens talk to each other</p>
</li>
<li><p>how the model thinks from multiple angles</p>
</li>
<li><p>how each token polishes its meaning</p>
</li>
</ul>
<p>That’s A LOT.</p>
<p>And all of it builds up to this moment.</p>
<p>There’s just one thing left:</p>
<blockquote>
<p><strong>Seeing how all these pieces fit together in one single Transformer block.</strong></p>
</blockquote>
<p><img src="https://miro.medium.com/1*pQ6bHYoJUSvGz-GE5O2xrQ.png" alt="Transformer Models by Google Brain Explained With PyTorch Implementation |  by Pragyan Subedi | Medium" /></p>
<h3 id="heading-looks-intense-right"><strong>Looks intense, right?</strong></h3>
<p>But the best part? Now it actually makes sense to you.</p>
<p>Let’s break it down at high level.</p>
<hr />
<h3 id="heading-what-youre-seeing"><strong>What You’re Seeing</strong></h3>
<p>Each block (the rectangles) is made of:</p>
<ul>
<li><p>Multi-Head Attention</p>
</li>
<li><p>Add &amp; Norm</p>
</li>
<li><p>Feed-Forward Network</p>
</li>
<li><p>Add &amp; Norm (again)</p>
</li>
</ul>
<p>And this block is repeated <strong>N times</strong> - meaning multiple layers stacked on top of each other.</p>
<p><strong>Every single layer</strong> refines your input a bit more.</p>
<hr />
<h3 id="heading-quick-note-on-add-amp-norm"><strong>Quick Note on Add &amp; Norm</strong></h3>
<p>Since you’ll see it everywhere:</p>
<ul>
<li><p><strong>Add</strong> = add the original value back (residual)</p>
</li>
<li><p><strong>Norm</strong> = normalize for stability</p>
</li>
</ul>
<p>You don’t need the formulas - just remember:</p>
<blockquote>
<p><strong>Add &amp; Norm keeps the model stable, smooth, and sane.</strong></p>
</blockquote>
<hr />
<h2 id="heading-now-the-final-question"><strong>Now the final question…</strong></h2>
<p>We understand the internal engine.</p>
<p>But how does the model actually turn all this into:</p>
<blockquote>
<p>“Here’s the answer to your question”?</p>
</blockquote>
<p>How does the model actually take all this processing and turn it into words?</p>
<p>How does it decide:</p>
<ul>
<li><p><em>which</em> token to generate</p>
</li>
<li><p><em>why</em> that token</p>
</li>
<li><p><em>how</em> the next token follows</p>
</li>
<li><p>and <em>how</em> the full reply appears to us</p>
</li>
</ul>
<p>That’s where <strong>Step 8</strong> comes in.</p>
<hr />
<hr />
<h2 id="heading-step-8-how-the-model-generates-words-linear-softmax-next-token"><strong>Step 8: How the Model Generates Words (Linear → Softmax → Next Token)</strong></h2>
<p>We’ve finally reached the last part of the pipeline.</p>
<p>Your text has been:</p>
<ul>
<li><p>tokenized</p>
</li>
<li><p>embedded</p>
</li>
<li><p>position-encoded</p>
</li>
<li><p>passed through attention</p>
</li>
<li><p>polished by feed-forward layers</p>
</li>
<li><p>processed through multiple Transformer blocks</p>
</li>
</ul>
<p>Now the model has one job left:</p>
<blockquote>
<p><strong>Pick the next word. And then the next. And then the next…</strong></p>
</blockquote>
<p>LLMs generate <strong>one token at a time</strong>, super fast.</p>
<p>Here’s how that final decision is made.</p>
<hr />
<h3 id="heading-step-1-linear-layer-raw-scores-logits"><strong>Step 1: Linear Layer → Raw Scores (Logits)</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763790022855/3b8afb71-5d7e-4cb9-a0e7-a30fabb11661.png" alt class="image--center mx-auto" /></p>
<p>After the last Transformer block, every token representation is pushed into a simple linear layer.</p>
<p>This layer does something extremely basic:</p>
<blockquote>
<p>It gives a <strong>score</strong> for every possible next token in the entire vocabulary.</p>
</blockquote>
<p>Not probabilities.</p>
<p>Not choices.</p>
<p>Just raw scores.</p>
<p>If your vocabulary has 50,000 tokens, you get 50,000 scores.</p>
<p>Example (conceptual):</p>
<pre><code class="lang-python">Token options: [<span class="hljs-string">"I"</span>, <span class="hljs-string">"am"</span>, <span class="hljs-string">"hungry"</span>]
Linear layer scores: [<span class="hljs-number">2.3</span>, <span class="hljs-number">1.2</span>, <span class="hljs-number">-0.5</span>]
</code></pre>
<hr />
<h3 id="heading-step-2-softmax-turn-scores-into-probabilities"><strong>Step 2: Softmax → Turn Scores Into Probabilities</strong></h3>
<p>Softmax takes those raw scores and turns them into probabilities that add up to 1.</p>
<p>Example:</p>
<pre><code class="lang-python">logits: [<span class="hljs-number">2.3</span>, <span class="hljs-number">1.2</span>, <span class="hljs-number">-0.5</span>]
softmax → [<span class="hljs-number">0.70</span>, <span class="hljs-number">0.25</span>, <span class="hljs-number">0.05</span>]
</code></pre>
<p>Now the model knows:</p>
<ul>
<li><p>“I” → 70%</p>
</li>
<li><p>“am” → 25%</p>
</li>
<li><p>“hungry” → 5%</p>
</li>
</ul>
<p><strong>Softmax is NOT creativity or randomness.</strong></p>
<p>It’s just the function that converts scores → probabilities.</p>
<hr />
<h3 id="heading-step-3-sampling-choose-the-next-token"><strong>Step 3: Sampling → Choose the Next Token</strong></h3>
<p>Now the model must pick <strong>one</strong> token from the probability distribution.</p>
<p>There are different ways to do this:</p>
<h3 id="heading-1-greedy-sampling-simple-predictable"><strong>1. Greedy Sampling (simple + predictable)</strong></h3>
<p>Choose the highest probability token.</p>
<p>Good for factual answers.</p>
<p>Bad for creative writing.</p>
<h3 id="heading-2-temperature-controls-randomness"><strong>2. Temperature (controls randomness)</strong></h3>
<ul>
<li><p>Low temperature → safer, more focused text</p>
</li>
<li><p>High temperature → more creative, more surprising</p>
</li>
</ul>
<p>Example:</p>
<ul>
<li><p>Temperature 0.1 → “The sky is blue.”</p>
</li>
<li><p>Temperature 1.0 → “The sky is a canvas of shifting moods.”</p>
</li>
</ul>
<h3 id="heading-3-top-k-top-p-smart-creativity-filters"><strong>3. Top-k / Top-p (smart creativity filters)</strong></h3>
<p>Limit the model to the top few likely tokens so it doesn’t go crazy.</p>
<p>These strategies shape how “creative” or “serious” the model feels.</p>
<hr />
<h3 id="heading-step-4-repeat-again-and-again"><strong>Step 4: Repeat… again… and again</strong></h3>
<p>Once the model chooses the next token:</p>
<ol>
<li><p>It appends it to the sequence</p>
</li>
<li><p>Feeds the entire updated sequence back into the Transformer</p>
</li>
<li><p>Repeats Linear → Softmax → Sampling</p>
</li>
<li><p>Generates the next token</p>
</li>
<li><p>And so on…</p>
</li>
</ol>
<p>Until:</p>
<ul>
<li><p>the model finishes the sentence</p>
</li>
<li><p>or hits a stop token</p>
</li>
<li><p>or reaches a length limit</p>
</li>
</ul>
<p>That’s how you get complete paragraphs, stories, or explanations.</p>
<hr />
<h3 id="heading-step-5-detokenization-human-readable-text"><strong>Step 5: Detokenization → Human-Readable Text</strong></h3>
<pre><code class="lang-python">[<span class="hljs-number">40</span>, <span class="hljs-number">939</span>, <span class="hljs-number">5306</span>] -&gt; → <span class="hljs-string">"I am doing"</span>
</code></pre>
<p>Here’s a real example using tiktoken:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763793741022/b0f02b14-d83f-446a-ac96-c635007be4df.png" alt class="image--center mx-auto" /></p>
<p><strong>Terminal output:</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763793769606/2b9ec979-0ca9-4864-a489-7dfc282f68fd.png" alt class="image--center mx-auto" /></p>
<p>This is the final magic step - turning numbers back into natural language.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763793974423/cc510eb1-b25f-4c9f-afab-8665d0f9de09.png" alt class="image--center mx-auto" /></p>
<hr />
<p>So the full output process is:</p>
<p><strong>Linear Layer → Softmax → Pick Next Token → Repeat → Detokenize → Final Answer</strong></p>
<p>That’s how ChatGPT replies to you -</p>
<p>one small token at a time, insanely fast.</p>
<hr />
<hr />
<h1 id="heading-final-thoughts-the-craziest-part-it-writes-one-token-at-a-time"><strong>Final Thoughts: The Craziest Part? It Writes One Token at a Time.</strong></h1>
<p>The wildest part of all this?</p>
<p>LLMs don’t generate full sentences or paragraphs in their heads.</p>
<p>They generate <strong>one token at a time</strong>:</p>
<ul>
<li><p>pick a token</p>
</li>
<li><p>feed it back</p>
</li>
<li><p>predict the next</p>
</li>
<li><p>repeat</p>
</li>
<li><p>insanely fast</p>
</li>
</ul>
<p>That’s it.</p>
<p>That’s the entire magic behind the curtain.</p>
<p>And yet - with just token-by-token predictions, Transformers create:</p>
<ul>
<li><p>essays</p>
</li>
<li><p>jokes</p>
</li>
<li><p>poems</p>
</li>
<li><p>stories</p>
</li>
<li><p>explanations</p>
</li>
<li><p>code</p>
</li>
<li><p>full conversations</p>
</li>
</ul>
<p>Wild, right?</p>
<p>But this is only <strong>half</strong> the story.</p>
<p>Everything you learned here explains <strong>inference</strong> - how the model uses its knowledge to answer you.</p>
<p>The other half - <strong>how the model learns in the first place</strong> (training, gradients, loss functions, backprop, massive datasets) - is a world of its own.</p>
<p>And trust me… that one’s crazy too.</p>
<p><strong>So next, we’ll peel back the training side -</strong></p>
<p><strong>how an LLM goes from clueless to genius.</strong></p>
<p>Stay tuned. 😄✌🏻</p>
<hr />
]]></content:encoded></item></channel></rss>