No spam, promise. I only send curated blogs that match your interests — the stuff you'd actually want to read.
Thoughts on whatever I build, break, and learn in AI, engineering, and more.
How speculative decoding makes LLM inference 2-13x faster by having a small draft model propose tokens and a big model verify them in parallel — with zero quality loss.
A former Mastra contributor's dormant npm account was hijacked to inject a crypto-stealing RAT into 144 packages with 1.1M weekly downloads — all in 88 minutes.
Anthropic buried a policy in Fable 5's 319-page system card that silently degraded responses for AI researchers building competing models — using invisible steering vectors instead of visible refusals. The backlash forced a reversal in 48 hours.
Breaking down Hermes Agent from Nous Research — what skills, GEPA, and self-improving workflows actually mean under the hood. No hype, just how it works.
I spent 2 hours decoding Airtel's prepaid plans so you don't have to. The ₹279 OTT pack, the ₹51 5G booster, the ₹469 voice plan — each has a quiet gotcha nobody puts in the headline. Here's the one plan that actually works for a Wi-Fi-first life, and why.
How I turned an always-on office PC into a remote dev server I can reach from my phone anywhere — Tailscale + Windows OpenSSH + per-user sandboxing — and the SSH-key leak that taught me why "just make everyone an admin" is a trap.
Most Claude Code users hit /compact like a panic button. But they never ask what actually happens. Here's the full breakdown — when it saves you, when it burns you, and the focus trick nobody uses.
A heartfelt reflection on my 1 year 7 months at Recursive Zero — how a Twitter reply turned into the most defining chapter of my early career, and why I'd make the same choice again.
Harness Engineering is the defining AI discipline of 2026 — the system around the model matters 6x more than the model itself. Here's what it is, why it matters, and how to start building one.
How moving from Cloudflare Workers to Node.js exposed 3 layers of polyfill chaos — process.env, fetch, and Buffer all silently replaced by browser shims in the server bundle.
A deep technical dive into why outbound API calls from WebContainer-based apps fail with socket hang up and CORS errors, what the real architecture looks like under the hood, and the proxy pattern that actually works.
I analyzed 35+ conversation logs from Claude Code sessions to find real patterns in my daily work — then built custom slash commands and agents to automate the repetitive stuff. Here's exactly how I did it and what I built.
A deep dive into debugging and fixing the Warp terminal plugin for Claude Code on Windows — from tracing OSC escape sequences to building native Windows toast notifications.
MLA is the reason DeepSeek can serve a 671B model cheaply and fast. Here's how it actually works — no research paper vibes, just the real idea explained simply.
A raw, practical breakdown of the heartbeat mechanism — what it is, how it works, and how I implemented it during my internship at AI Planet this week.
A beginner-friendly breakdown of CLAUDE.md and AGENTS.md — what they are, how they differ, and the simple setup that future-proofs your workflow across any AI coding tool.
How vLLM's paged attention borrows virtual memory concepts from operating systems to solve the KV cache memory fragmentation problem — making LLM inference faster and more memory-efficient at scale.
A clear, visual explanation of the KV Cache — the optimization that makes autoregressive text generation fast by storing Key and Value vectors instead of recomputing them for every new token.
AI memory isn't like human memory—models forget everything. What we call memory is actually smart storing, searching, and injecting context at the right time using external systems.
A clear breakdown of what the encoder and decoder each do in a Transformer — their internal structure, how multi-head self-attention works, what cross-attention is, and when you'd use encoder-only vs decoder-only vs full encoder-decoder models.
Handpicked for you
A heartfelt reflection on my 1 year 7 months at Recursive Zero — how a Twitter reply turned into the most defining chapter of my early career, and why I'd make the same choice again.
How moving from Cloudflare Workers to Node.js exposed 3 layers of polyfill chaos — process.env, fetch, and Buffer all silently replaced by browser shims in the server bundle.
A deep technical dive into why outbound API calls from WebContainer-based apps fail with socket hang up and CORS errors, what the real architecture looks like under the hood, and the proxy pattern that actually works.
I analyzed 35+ conversation logs from Claude Code sessions to find real patterns in my daily work — then built custom slash commands and agents to automate the repetitive stuff. Here's exactly how I did it and what I built.
MLA is the reason DeepSeek can serve a 671B model cheaply and fast. Here's how it actually works — no research paper vibes, just the real idea explained simply.