Blogs

Thoughts on whatever I build, break, and learn in AI, engineering, and more.

Showing 20 of 43 postsPage 1 of 3

Speculative Decoding — The Intern Trick That Makes LLMs 13x Faster

How speculative decoding makes LLM inference 2-13x faster by having a small draft model propose tokens and a big model verify them in parallel — with zero quality loss.

18 Jun 2026•

speculative-decodingllm-inferencetransformersoptimizationcursormachine-learninglearning-in-public

144 npm Packages Backdoored in 88 Minutes — Inside Today's Mastra Supply Chain Attack

A former Mastra contributor's dormant npm account was hijacked to inject a crypto-stealing RAT into 144 packages with 1.1M weekly downloads — all in 88 minutes.

17 Jun 2026•

npmsupply-chain-securitymastrajavascriptcybersecurityopen-sourceai-agentslearning-in-public

Fable 5's Silent Sabotage — Anthropic Built a Model That Secretly Gave Rivals Worse Answers

Anthropic buried a policy in Fable 5's 319-page system card that silently degraded responses for AI researchers building competing models — using invisible steering vectors instead of visible refusals. The backlash forced a reversal in 48 hours.

16 Jun 2026•

fable-5anthropicai-safetymodel-governancesteering-vectorsai-ethicsfrontier-ailearning-in-public

Hermes Agent Demystified — Why AI Twitter Won't Shut Up About It

Breaking down Hermes Agent from Nous Research — what skills, GEPA, and self-improving workflows actually mean under the hood. No hype, just how it works.

16 Jun 2026•

hermes-agentai-agentsnous-researchgepaprompt-evolutionharness-engineeringlearning-in-public

Airtel's ₹279 Plan Looked Perfect — Until One Question Broke the Whole Thing

I spent 2 hours decoding Airtel's prepaid plans so you don't have to. The ₹279 OTT pack, the ₹51 5G booster, the ₹469 voice plan — each has a quiet gotcha nobody puts in the headline. Here's the one plan that actually works for a Wi-Fi-first life, and why.

04 Jun 2026•

airteltelecomprepaid-plansott5gnetflixmoney-savingindiaBuying Guide

I Turned an Idle Office PC Into My Own Cloud Server — So I Can Sleep While Agents Work

How I turned an always-on office PC into a remote dev server I can reach from my phone anywhere — Tailscale + Windows OpenSSH + per-user sandboxing — and the SSH-key leak that taught me why "just make everyone an admin" is a trap.

03 Jun 2026•

tailscaleopensshwindowsself-hostinghomelabsecurityclaude-codeai-agentsBuilding In Public

You're Using /compact Wrong — Here's What It Actually Does Under the Hood

Most Claude Code users hit /compact like a panic button. But they never ask what actually happens. Here's the full breakdown — when it saves you, when it burns you, and the focus trick nobody uses.

01 Jun 2026•

claude-codeai-toolsdeveloper-productivityllmcontext-management

Looking Back at Recursive Zero — The Internship That Changed Everything

A heartfelt reflection on my 1 year 7 months at Recursive Zero — how a Twitter reply turned into the most defining chapter of my early career, and why I'd make the same choice again.

31 May 2026•

internshipcareerrecursive-zerolearning-in-publicpersonalweb-development

Harness Engineering — The Shift That Makes AI Systems Actually Work in Production

Harness Engineering is the defining AI discipline of 2026 — the system around the model matters 6x more than the model itself. Here's what it is, why it matters, and how to start building one.

21 May 2026•

harness-engineeringai-engineeringllmproduction-aievaluationguardrailsmlopsLearning In Public

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

How moving from Cloudflare Workers to Node.js exposed 3 layers of polyfill chaos — process.env, fetch, and Buffer all silently replaced by browser shims in the server bundle.

21 May 2026•

remixvitenode-jspolyfillsai-sdkdeploymentdebuggingBuilding In Public

Why Your Vite Proxy Fails Inside WebContainer — And How to Actually Fix It

A deep technical dive into why outbound API calls from WebContainer-based apps fail with socket hang up and CORS errors, what the real architecture looks like under the hood, and the proxy pattern that actually works.

18 May 2026•

webcontainercorsviteremixarchitecturedebuggingsse-streamingdeveloper-experience

How I Automated My Entire Dev Workflow by Analyzing 45 Days of Claude Code Conversations

I analyzed 35+ conversation logs from Claude Code sessions to find real patterns in my daily work — then built custom slash commands and agents to automate the repetitive stuff. Here's exactly how I did it and what I built.

17 May 2026•

claude-codedeveloper-productivityautomationclidevtoolsai-tools

How I Fixed Windows Notifications for Claude Code's Warp Plugin

A deep dive into debugging and fixing the Warp terminal plugin for Claude Code on Windows — from tracing OSC escape sequences to building native Windows toast notifications.

16 May 2026•

claude-codewarpwindowsdeveloper-toolsdebugging

Multi-Head Latent Attention — The Memory Trick Behind DeepSeek's Insane Efficiency

MLA is the reason DeepSeek can serve a 671B model cheaply and fast. Here's how it actually works — no research paper vibes, just the real idea explained simply.

27 Apr 2026•

transformersmladeepseekkv-cacheinferencellmattentionoptimization

Heartbeat Mechanism — The Simplest Idea Behind Every Reliable System

A raw, practical breakdown of the heartbeat mechanism — what it is, how it works, and how I implemented it during my internship at AI Planet this week.

18 Apr 2026•

distributed-systemsbackendinternshipai-planetsystem-designlearning-in-public

CLAUDE.md vs AGENTS.md — The Two Files That Make AI Actually Understand Your Project

A beginner-friendly breakdown of CLAUDE.md and AGENTS.md — what they are, how they differ, and the simple setup that future-proofs your workflow across any AI coding tool.

17 Apr 2026•

claude-codeagentsai-toolsdeveloper-toolsproductivityclaude

Paged Self-Attention — The OS Trick That Makes LLM Memory Management Fast

How vLLM's paged attention borrows virtual memory concepts from operating systems to solve the KV cache memory fragmentation problem — making LLM inference faster and more memory-efficient at scale.

14 Apr 2026•

transformerspaged-attentionvllminferencellmmemoryoptimization

KV Cache — Why LLMs Don't Reread the Entire Conversation Every Time

A clear, visual explanation of the KV Cache — the optimization that makes autoregressive text generation fast by storing Key and Value vectors instead of recomputing them for every new token.

13 Apr 2026•

transformerskv-cacheinferencedeep-learningllmoptimization

AI Memory Isn't Memory — It's Smart Context Injection

AI memory isn't like human memory—models forget everything. What we call memory is actually smart storing, searching, and injecting context at the right time using external systems.

02 Apr 2026•

AILLMMemory SystemsContext WindowVector DatabaseLearning In Public

Encoder vs Decoder — What Each Half of the Transformer Actually Does

A clear breakdown of what the encoder and decoder each do in a Transformer — their internal structure, how multi-head self-attention works, what cross-attention is, and when you'd use encoder-only vs decoder-only vs full encoder-decoder models.

30 Mar 2026•

transformersencoderdecoderdeep-learningnlpattentionai

Curated Blogs

Handpicked for you

internshipcareer

Looking Back at Recursive Zero — The Internship That Changed Everything

A heartfelt reflection on my 1 year 7 months at Recursive Zero — how a Twitter reply turned into the most defining chapter of my early career, and why I'd make the same choice again.

2026-05-31Read

remixvite

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

How moving from Cloudflare Workers to Node.js exposed 3 layers of polyfill chaos — process.env, fetch, and Buffer all silently replaced by browser shims in the server bundle.

2026-05-21Read

webcontainercors

Speculative Decoding — The Intern Trick That Makes LLMs 13x Faster

144 npm Packages Backdoored in 88 Minutes — Inside Today's Mastra Supply Chain Attack

Fable 5's Silent Sabotage — Anthropic Built a Model That Secretly Gave Rivals Worse Answers

Hermes Agent Demystified — Why AI Twitter Won't Shut Up About It

Airtel's ₹279 Plan Looked Perfect — Until One Question Broke the Whole Thing

I Turned an Idle Office PC Into My Own Cloud Server — So I Can Sleep While Agents Work

You're Using /compact Wrong — Here's What It Actually Does Under the Hood

Looking Back at Recursive Zero — The Internship That Changed Everything

Harness Engineering — The Shift That Makes AI Systems Actually Work in Production

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

Why Your Vite Proxy Fails Inside WebContainer — And How to Actually Fix It

How I Automated My Entire Dev Workflow by Analyzing 45 Days of Claude Code Conversations

How I Fixed Windows Notifications for Claude Code's Warp Plugin

Multi-Head Latent Attention — The Memory Trick Behind DeepSeek's Insane Efficiency

Heartbeat Mechanism — The Simplest Idea Behind Every Reliable System

CLAUDE.md vs AGENTS.md — The Two Files That Make AI Actually Understand Your Project

Paged Self-Attention — The OS Trick That Makes LLM Memory Management Fast

KV Cache — Why LLMs Don't Reread the Entire Conversation Every Time

AI Memory Isn't Memory — It's Smart Context Injection

Encoder vs Decoder — What Each Half of the Transformer Actually Does

Curated Blogs

Looking Back at Recursive Zero — The Internship That Changed Everything

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

Why Your Vite Proxy Fails Inside WebContainer — And How to Actually Fix It

How I Automated My Entire Dev Workflow by Analyzing 45 Days of Claude Code Conversations

Multi-Head Latent Attention — The Memory Trick Behind DeepSeek's Insane Efficiency

Subscribe to my newsletter

Speculative Decoding — The Intern Trick That Makes LLMs 13x Faster

144 npm Packages Backdoored in 88 Minutes — Inside Today's Mastra Supply Chain Attack

Fable 5's Silent Sabotage — Anthropic Built a Model That Secretly Gave Rivals Worse Answers

Hermes Agent Demystified — Why AI Twitter Won't Shut Up About It

Airtel's ₹279 Plan Looked Perfect — Until One Question Broke the Whole Thing

I Turned an Idle Office PC Into My Own Cloud Server — So I Can Sleep While Agents Work

You're Using /compact Wrong — Here's What It Actually Does Under the Hood

Looking Back at Recursive Zero — The Internship That Changed Everything

Harness Engineering — The Shift That Makes AI Systems Actually Work in Production

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

Why Your Vite Proxy Fails Inside WebContainer — And How to Actually Fix It

How I Automated My Entire Dev Workflow by Analyzing 45 Days of Claude Code Conversations

How I Fixed Windows Notifications for Claude Code's Warp Plugin

Multi-Head Latent Attention — The Memory Trick Behind DeepSeek's Insane Efficiency

Heartbeat Mechanism — The Simplest Idea Behind Every Reliable System

CLAUDE.md vs AGENTS.md — The Two Files That Make AI Actually Understand Your Project

Paged Self-Attention — The OS Trick That Makes LLM Memory Management Fast

KV Cache — Why LLMs Don't Reread the Entire Conversation Every Time

AI Memory Isn't Memory — It's Smart Context Injection

Encoder vs Decoder — What Each Half of the Transformer Actually Does

Curated Blogs

Looking Back at Recursive Zero — The Internship That Changed Everything

Three Invisible Bugs That Broke My App — And They Were All "Helpers"

Why Your Vite Proxy Fails Inside WebContainer — And How to Actually Fix It

How I Automated My Entire Dev Workflow by Analyzing 45 Days of Claude Code Conversations

Multi-Head Latent Attention — The Memory Trick Behind DeepSeek's Insane Efficiency