How I Built an AI-Powered Blog Recommendation System From Scratch
Building a semantic blog recommendation system from scratch using embeddings, vector databases, and pre-computed results—why tags aren't enough and how I integrated ML into my Next.js portfolio.
The Problem
After writing several technical blogs on my portfolio, I noticed a gap: readers finishing one article had no easy way to discover related content. Sure, I could add "Related Posts" based on shared tags, but that felt... boring.
Tags are lazy matching.
A blog tagged "automation" and another tagged "workflow" might be deeply related conceptually, but keyword matching would never connect them. I wanted something smarter.
I wanted my portfolio to actually understand what each blog was about.
The Idea: Semantic Recommendations
What if I could convert each blog into a mathematical representation that captures its meaning, not just its words?
This is exactly what embedding models do. They transform text into high-dimensional vectors (768 numbers, in my case) where similar content ends up close together in vector space.
"Building automated workflows with n8n" → [0.23, -0.45, 0.12, ...]
"Creating CI/CD pipelines" → [0.21, -0.42, 0.15, ...] # Similar!
"Baking chocolate chip cookies" → [-0.67, 0.89, -0.23, ...] # Very different
By measuring the distance (cosine similarity) between these vectors, I could find which blogs are semantically similar — even if they don't share a single keyword.
The Architecture
Here's what I built:
┌─────────────────────────────────────────────────────────────────┐
│ RECOMMENDATION PIPELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ MongoDB Python Pipeline Pinecone Next.js │
│ ┌──────┐ ┌──────────────┐ ┌──────────┐ ┌────────┐ │
│ │Blogs │──────►│ Preprocess │───►│ Vector │◄──│ API │ │
│ │ │ │ Embed │ │ Store │ │ Route │ │
│ └──────┘ │ Compute │ └──────────┘ └────────┘ │
│ └──────────────┘ │ │
│ │ ▼ │
│ │ ┌──────────┐ │
│ └─────────────────────────►│ JSON │ │
│ Pre-computed results │ Export │ │
│ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
Tech Stack:
- Database: MongoDB (blog storage)
- ML Pipeline: Python 3.10+
- Embeddings: Google AI
text-embedding-004 - Vector DB: Pinecone (serverless)
- Frontend: Next.js 15 (App Router)
- Hosting: Vercel Edge
Step 1: Preprocessing
Raw blog content is messy. It has markdown syntax, code blocks, URLs, and formatting that adds noise to embeddings. The preprocessing pipeline cleans all this up.
def preprocess_blog(blog):
# Remove code blocks (keep [code] marker for context)
content = remove_code_blocks(blog['content'])
# Convert markdown to plain text
plain_text = markdown_to_plain_text(content)
# Clean URLs, special characters
clean_content = clean_text(plain_text)
# Combine with weighted importance
return f"""
Title: {blog['title']}
Summary: {blog['description']}
Topics: {', '.join(blog['tags'])}
Content: {clean_content}
"""
The key insight: title and description should come first because embedding models give more weight to earlier content.
Step 2: Generating Embeddings
I used Google AI's text-embedding-004 model, which produces 768-dimensional vectors optimized for semantic search.
from google import genai
def generate_embedding(text):
client = genai.Client(api_key=API_KEY)
result = client.models.embed_content(
model="text-embedding-004",
contents=text,
config={"task_type": "RETRIEVAL_DOCUMENT"}
)
return result.embeddings[0].values # 768 floats
Why Google AI?
- Free tier is generous
- High-quality embeddings
- Fast inference
- Specifically optimized for retrieval tasks
Step 3: Vector Storage with Pinecone
Embeddings are useless if you can't search them efficiently. Pinecone provides millisecond-latency similarity search at any scale.
from pinecone import Pinecone
def store_embeddings(blogs_with_embeddings):
pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index("portfolio-blog-embedding")
vectors = [
{
"id": blog["slug"],
"values": blog["embedding"],
"metadata": {
"title": blog["title"],
"description": blog["description"],
"tags": blog["tags"]
}
}
for blog in blogs_with_embeddings
]
index.upsert(vectors=vectors)
Index Configuration:
- Dimension: 768 (matching Google AI output)
- Metric: Cosine similarity
- Cloud: AWS us-east-1 (serverless)
Step 4: Computing Recommendations
For each blog, I query Pinecone to find the top-K most similar blogs (excluding itself).
def find_similar_blogs(slug, top_k=3):
# Fetch this blog's embedding
result = index.fetch(ids=[slug])
query_embedding = result.vectors[slug].values
# Search for similar
results = index.query(
vector=query_embedding,
top_k=top_k + 1, # +1 to exclude self
include_metadata=True
)
# Filter and return
return [
{
"slug": match.id,
"title": match.metadata["title"],
"score": match.score
}
for match in results.matches
if match.id != slug
][:top_k]
Step 5: Pre-computing for Performance
Here's the clever part: I don't run ML inference at request time. Instead, I pre-compute all recommendations and export them as JSON.
def export_recommendations():
recommendations = {}
for blog in all_blogs:
recommendations[blog["slug"]] = find_similar_blogs(blog["slug"])
with open("data/recommendations.json", "w") as f:
json.dump(recommendations, f)
The Next.js API route simply reads this file:
// /api/recommendations/[slug]/route.ts
export async function GET(request, { params }) {
const data = JSON.parse(fs.readFileSync("recommendations.json"))
return Response.json({
recommendations: data.recommendations[params.slug]
})
}
Why pre-compute?
- Zero latency (just file read)
- No API costs at runtime
- Works on serverless/edge
- Scales infinitely
The Results
Here's the similarity matrix for my first 3 blogs:
n8n Workflow Blog System MCP Servers
My First n8n Workflow 1.0000 0.6614 0.5866
How I Built Blog System 0.6614 1.0000 0.6361
Understanding MCP 0.5866 0.6361 1.0000
The system correctly identified that:
- "n8n Workflow" and "Blog System" are most similar (both about building things)
- "MCP Servers" relates more to "Blog System" than "n8n" (both about developer tooling)
These connections make intuitive sense — the embeddings captured the essence of each blog.
The Frontend Component
A clean React component displays recommendations at the end of each blog:
export function BlogRecommendations({ currentSlug }) {
const [recommendations, setRecommendations] = useState([])
useEffect(() => {
fetch(`/api/recommendations/${currentSlug}`)
.then(res => res.json())
.then(data => setRecommendations(data.recommendations))
}, [currentSlug])
return (
<section>
<h2>✨ You might also enjoy</h2>
{recommendations.map(rec => (
<Link href={`/blog/${rec.slug}`}>
<h3>{rec.title}</h3>
<span>{Math.round(rec.score * 100)}% match</span>
</Link>
))}
</section>
)
}
Monthly Retraining
When I write new blogs, I simply run:
cd ml
python scripts/train.py
The entire pipeline — fetching, preprocessing, embedding, storing, computing — runs in about 15 seconds for 6 blogs. The JSON file updates, and the website automatically serves new recommendations.
What I Learned
Preprocessing is Crucial: Garbage in, garbage out. Spending time cleaning markdown, removing noise, and structuring input dramatically improved embedding quality.
Embeddings are Magic: The way modern embedding models capture semantic meaning is remarkable. Blogs about completely different topics but similar intent end up close in vector space.
Pre-computation Wins: For small-to-medium datasets, pre-computing everything offline is simpler, faster, and cheaper than real-time inference.
Integration is the Hard Part: Building the ML pipeline was one thing. Integrating it cleanly with a Next.js application, handling errors gracefully, and creating a good UX took just as much effort.
Why I Built Instead of Bought
I could have used a plugin. I could have called a recommendation API. But building from scratch gave me:
- Deep understanding of how embeddings and vector search work
- Full control over preprocessing and similarity logic
- Portfolio proof that I can integrate ML with web applications
- A story worth sharing (you're reading it!)