Blogs

Thoughts on whatever I build, break, and learn in AI, engineering, and more.

Showing 20 of 46 postsPage 1 of 3

Context Windows Are a Lie — Why 1 Million Tokens Doesn't Mean What You Think

Context windows keep getting bigger — 200K, 1M, 1.5M tokens. But models forget stuff in the middle, attention scales quadratically, and bigger often means worse. Here's what actually happens inside.

24 Jun 2026•

context-windowllmtransformersattentionlost-in-the-middleropekv-cacheraginferencelearning-in-public

Tool Calling Is Just JSON — How LLMs Actually "Use" Tools Without Running Anything

LLMs don't execute tools — they output structured JSON that your code runs. Here's exactly how tool calling works, why the while-loop is the entire architecture, and what fails in practice.

22 Jun 2026•

tool-callingfunction-callingllmai-agentsjson-schemamcpopenaianthropiclearning-in-public

$1,500 to Train a Frontier AI Model From Scratch — The Architecture Trick That Defies Scaling Laws

Sapient Intelligence trained a 1.15B parameter model from scratch for $1,500 that rivals 7B models — using hierarchical recurrence, task-completion training, and 900x less data than competitors.

19 Jun 2026•

hrm-textsapientefficient-trainingtransformersscaling-lawsarchitectureedge-aiopen-sourcelearning-in-public

Speculative Decoding — The Intern Trick That Makes LLMs 13x Faster

How speculative decoding makes LLM inference 2-13x faster by having a small draft model propose tokens and a big model verify them in parallel — with zero quality loss.

18 Jun 2026•

speculative-decodingllm-inferencetransformersoptimizationcursormachine-learninglearning-in-public

144 npm Packages Backdoored in 88 Minutes — Inside Today's Mastra Supply Chain Attack

A former Mastra contributor's dormant npm account was hijacked to inject a crypto-stealing RAT into 144 packages with 1.1M weekly downloads — all in 88 minutes.

17 Jun 2026•

npmsupply-chain-securitymastrajavascriptcybersecurityopen-sourceai-agentslearning-in-public

Fable 5's Silent Sabotage — Anthropic Built a Model That Secretly Gave Rivals Worse Answers

Anthropic buried a policy in Fable 5's 319-page system card that silently degraded responses for AI researchers building competing models — using invisible steering vectors instead of visible refusals. The backlash forced a reversal in 48 hours.

16 Jun 2026•

fable-5anthropicai-safetymodel-governancesteering-vectorsai-ethicsfrontier-ailearning-in-public

Hermes Agent Demystified — Why AI Twitter Won't Shut Up About It

Breaking down Hermes Agent from Nous Research — what skills, GEPA, and self-improving workflows actually mean under the hood. No hype, just how it works.

16 Jun 2026•

hermes-agentai-agentsnous-researchgepaprompt-evolutionharness-engineeringlearning-in-public

Airtel's ₹279 Plan Looked Perfect — Until One Question Broke the Whole Thing

I spent 2 hours decoding Airtel's prepaid plans so you don't have to. The ₹279 OTT pack, the ₹51 5G booster, the ₹469 voice plan — each has a quiet gotcha nobody puts in the headline. Here's the one plan that actually works for a Wi-Fi-first life, and why.

04 Jun 2026•

airteltelecomprepaid-plansott5gnetflixmoney-savingindiaBuying Guide

I Turned an Idle Office PC Into My Own Cloud Server — So I Can Sleep While Agents Work

How I turned an always-on office PC into a remote dev server I can reach from my phone anywhere — Tailscale + Windows OpenSSH + per-user sandboxing — and the SSH-key leak that taught me why "just make everyone an admin" is a trap.

03 Jun 2026•

tailscaleopensshwindowsself-hostinghomelabsecurityclaude-codeai-agentsBuilding In Public

You're Using /compact Wrong — Here's What It Actually Does Under the Hood

Most Claude Code users hit /compact like a panic button. But they never ask what actually happens. Here's the full breakdown — when it saves you, when it burns you, and the focus trick nobody uses.

01 Jun 2026•

claude-codeai-toolsdeveloper-productivityllmcontext-management

Looking Back at Recursive Zero — The Internship That Changed Everything

A heartfelt reflection on my 1 year 7 months at Recursive Zero — how a Twitter reply turned into the most defining chapter of my early career, and why I'd make the same choice again.

31 May 2026•

internshipcareerrecursive-zerolearning-in-publicpersonalweb-development

Harness Engineering — The Shift That Makes AI Systems Actually Work in Production

Harness Engineering is the defining AI discipline of 2026 — the system around the model matters 6x more than the model itself. Here's what it is, why it matters, and how to start building one.

21 May 2026•

harness-engineeringai-engineeringllmproduction-aievaluationguardrailsmlopsLearning In Public

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

How moving from Cloudflare Workers to Node.js exposed 3 layers of polyfill chaos — process.env, fetch, and Buffer all silently replaced by browser shims in the server bundle.

21 May 2026•

remixvitenode-jspolyfillsai-sdkdeploymentdebuggingBuilding In Public

Why Your Vite Proxy Fails Inside WebContainer — And How to Actually Fix It

A deep technical dive into why outbound API calls from WebContainer-based apps fail with socket hang up and CORS errors, what the real architecture looks like under the hood, and the proxy pattern that actually works.

18 May 2026•

webcontainercorsviteremixarchitecturedebuggingsse-streamingdeveloper-experience

How I Automated My Entire Dev Workflow by Analyzing 45 Days of Claude Code Conversations

I analyzed 35+ conversation logs from Claude Code sessions to find real patterns in my daily work — then built custom slash commands and agents to automate the repetitive stuff. Here's exactly how I did it and what I built.

17 May 2026•

claude-codedeveloper-productivityautomationclidevtoolsai-tools

How I Fixed Windows Notifications for Claude Code's Warp Plugin

A deep dive into debugging and fixing the Warp terminal plugin for Claude Code on Windows — from tracing OSC escape sequences to building native Windows toast notifications.

16 May 2026•

claude-codewarpwindowsdeveloper-toolsdebugging

Multi-Head Latent Attention — The Memory Trick Behind DeepSeek's Insane Efficiency

MLA is the reason DeepSeek can serve a 671B model cheaply and fast. Here's how it actually works — no research paper vibes, just the real idea explained simply.

27 Apr 2026•

transformersmladeepseekkv-cacheinferencellmattentionoptimization

Heartbeat Mechanism — The Simplest Idea Behind Every Reliable System

A raw, practical breakdown of the heartbeat mechanism — what it is, how it works, and how I implemented it during my internship at AI Planet this week.

18 Apr 2026•

distributed-systemsbackendinternshipai-planetsystem-designlearning-in-public

CLAUDE.md vs AGENTS.md — The Two Files That Make AI Actually Understand Your Project

A beginner-friendly breakdown of CLAUDE.md and AGENTS.md — what they are, how they differ, and the simple setup that future-proofs your workflow across any AI coding tool.

17 Apr 2026•

claude-codeagentsai-toolsdeveloper-toolsproductivityclaude

Paged Self-Attention — The OS Trick That Makes LLM Memory Management Fast

How vLLM's paged attention borrows virtual memory concepts from operating systems to solve the KV cache memory fragmentation problem — making LLM inference faster and more memory-efficient at scale.

14 Apr 2026•

transformerspaged-attentionvllminferencellmmemoryoptimization

Curated Blogs

Handpicked for you

internshipcareer

Looking Back at Recursive Zero — The Internship That Changed Everything

A heartfelt reflection on my 1 year 7 months at Recursive Zero — how a Twitter reply turned into the most defining chapter of my early career, and why I'd make the same choice again.

2026-05-31Read

remixvite

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

How moving from Cloudflare Workers to Node.js exposed 3 layers of polyfill chaos — process.env, fetch, and Buffer all silently replaced by browser shims in the server bundle.

2026-05-21Read

webcontainercors

Context Windows Are a Lie — Why 1 Million Tokens Doesn't Mean What You Think

Tool Calling Is Just JSON — How LLMs Actually "Use" Tools Without Running Anything

$1,500 to Train a Frontier AI Model From Scratch — The Architecture Trick That Defies Scaling Laws

Speculative Decoding — The Intern Trick That Makes LLMs 13x Faster

144 npm Packages Backdoored in 88 Minutes — Inside Today's Mastra Supply Chain Attack

Fable 5's Silent Sabotage — Anthropic Built a Model That Secretly Gave Rivals Worse Answers

Hermes Agent Demystified — Why AI Twitter Won't Shut Up About It

Airtel's ₹279 Plan Looked Perfect — Until One Question Broke the Whole Thing

I Turned an Idle Office PC Into My Own Cloud Server — So I Can Sleep While Agents Work

You're Using /compact Wrong — Here's What It Actually Does Under the Hood

Looking Back at Recursive Zero — The Internship That Changed Everything

Harness Engineering — The Shift That Makes AI Systems Actually Work in Production

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

Why Your Vite Proxy Fails Inside WebContainer — And How to Actually Fix It

How I Automated My Entire Dev Workflow by Analyzing 45 Days of Claude Code Conversations

How I Fixed Windows Notifications for Claude Code's Warp Plugin

Multi-Head Latent Attention — The Memory Trick Behind DeepSeek's Insane Efficiency

Heartbeat Mechanism — The Simplest Idea Behind Every Reliable System

CLAUDE.md vs AGENTS.md — The Two Files That Make AI Actually Understand Your Project

Paged Self-Attention — The OS Trick That Makes LLM Memory Management Fast

Curated Blogs

Looking Back at Recursive Zero — The Internship That Changed Everything

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

Why Your Vite Proxy Fails Inside WebContainer — And How to Actually Fix It

How I Automated My Entire Dev Workflow by Analyzing 45 Days of Claude Code Conversations

Multi-Head Latent Attention — The Memory Trick Behind DeepSeek's Insane Efficiency

Subscribe to my newsletter

Context Windows Are a Lie — Why 1 Million Tokens Doesn't Mean What You Think

Tool Calling Is Just JSON — How LLMs Actually "Use" Tools Without Running Anything

$1,500 to Train a Frontier AI Model From Scratch — The Architecture Trick That Defies Scaling Laws

Speculative Decoding — The Intern Trick That Makes LLMs 13x Faster

144 npm Packages Backdoored in 88 Minutes — Inside Today's Mastra Supply Chain Attack

Fable 5's Silent Sabotage — Anthropic Built a Model That Secretly Gave Rivals Worse Answers

Hermes Agent Demystified — Why AI Twitter Won't Shut Up About It

Airtel's ₹279 Plan Looked Perfect — Until One Question Broke the Whole Thing

I Turned an Idle Office PC Into My Own Cloud Server — So I Can Sleep While Agents Work

You're Using /compact Wrong — Here's What It Actually Does Under the Hood

Looking Back at Recursive Zero — The Internship That Changed Everything

Harness Engineering — The Shift That Makes AI Systems Actually Work in Production

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

Why Your Vite Proxy Fails Inside WebContainer — And How to Actually Fix It

How I Automated My Entire Dev Workflow by Analyzing 45 Days of Claude Code Conversations

How I Fixed Windows Notifications for Claude Code's Warp Plugin

Multi-Head Latent Attention — The Memory Trick Behind DeepSeek's Insane Efficiency

Heartbeat Mechanism — The Simplest Idea Behind Every Reliable System

CLAUDE.md vs AGENTS.md — The Two Files That Make AI Actually Understand Your Project

Paged Self-Attention — The OS Trick That Makes LLM Memory Management Fast

Curated Blogs

Looking Back at Recursive Zero — The Internship That Changed Everything

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

Why Your Vite Proxy Fails Inside WebContainer — And How to Actually Fix It

How I Automated My Entire Dev Workflow by Analyzing 45 Days of Claude Code Conversations

Multi-Head Latent Attention — The Memory Trick Behind DeepSeek's Insane Efficiency