How I Built an AI-Powered Blog Recommendation System From Scratch

The Problem

After writing several technical blogs on my portfolio, I noticed a gap: readers finishing one article had no easy way to discover related content. Sure, I could add "Related Posts" based on shared tags, but that felt... boring.

Tags are lazy matching.

A blog tagged "automation" and another tagged "workflow" might be deeply related conceptually, but keyword matching would never connect them. I wanted something smarter.

I wanted my portfolio to actually understand what each blog was about.

The Idea: Semantic Recommendations

What if I could convert each blog into a mathematical representation that captures its meaning, not just its words?

This is exactly what embedding models do. They transform text into high-dimensional vectors (768 numbers, in my case) where similar content ends up close together in vector space.

"Building automated workflows with n8n"  →  [0.23, -0.45, 0.12, ...]
"Creating CI/CD pipelines"               →  [0.21, -0.42, 0.15, ...]  # Similar!
"Baking chocolate chip cookies"          →  [-0.67, 0.89, -0.23, ...] # Very different

By measuring the distance (cosine similarity) between these vectors, I could find which blogs are semantically similar — even if they don't share a single keyword.

The Architecture

Here's what I built:

┌─────────────────────────────────────────────────────────────────┐
│                    RECOMMENDATION PIPELINE                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   MongoDB        Python Pipeline      Pinecone       Next.js    │
│   ┌──────┐       ┌──────────────┐    ┌──────────┐   ┌────────┐ │
│   │Blogs │──────►│ Preprocess   │───►│ Vector   │◄──│ API    │ │
│   │      │       │ Embed        │    │ Store    │   │ Route  │ │
│   └──────┘       │ Compute      │    └──────────┘   └────────┘ │
│                  └──────────────┘                        │      │
│                         │                                ▼      │
│                         │                          ┌──────────┐ │
│                         └─────────────────────────►│ JSON     │ │
│                           Pre-computed results     │ Export   │ │
│                                                    └──────────┘ │
└─────────────────────────────────────────────────────────────────┘

Tech Stack:

Database: MongoDB (blog storage)
ML Pipeline: Python 3.10+
Embeddings: Google AI text-embedding-004
Vector DB: Pinecone (serverless)
Frontend: Next.js 15 (App Router)
Hosting: Vercel Edge

Step 1: Preprocessing

Raw blog content is messy. It has markdown syntax, code blocks, URLs, and formatting that adds noise to embeddings. The preprocessing pipeline cleans all this up.

def preprocess_blog(blog):
    # Remove code blocks (keep [code] marker for context)
    content = remove_code_blocks(blog['content'])
    
    # Convert markdown to plain text
    plain_text = markdown_to_plain_text(content)
    
    # Clean URLs, special characters
    clean_content = clean_text(plain_text)
    
    # Combine with weighted importance
    return f"""
    Title: {blog['title']}
    Summary: {blog['description']}
    Topics: {', '.join(blog['tags'])}
    Content: {clean_content}
    """

The key insight: title and description should come first because embedding models give more weight to earlier content.

Step 2: Generating Embeddings

I used Google AI's text-embedding-004 model, which produces 768-dimensional vectors optimized for semantic search.

from google import genai

def generate_embedding(text):
    client = genai.Client(api_key=API_KEY)
    
    result = client.models.embed_content(
        model="text-embedding-004",
        contents=text,
        config={"task_type": "RETRIEVAL_DOCUMENT"}
    )
    
    return result.embeddings[0].values  # 768 floats

Why Google AI?

Free tier is generous
High-quality embeddings
Fast inference
Specifically optimized for retrieval tasks

Step 3: Vector Storage with Pinecone

Embeddings are useless if you can't search them efficiently. Pinecone provides millisecond-latency similarity search at any scale.

from pinecone import Pinecone

def store_embeddings(blogs_with_embeddings):
    pc = Pinecone(api_key=PINECONE_API_KEY)
    index = pc.Index("portfolio-blog-embedding")
    
    vectors = [
        {
            "id": blog["slug"],
            "values": blog["embedding"],
            "metadata": {
                "title": blog["title"],
                "description": blog["description"],
                "tags": blog["tags"]
            }
        }
        for blog in blogs_with_embeddings
    ]
    
    index.upsert(vectors=vectors)

Index Configuration:

Dimension: 768 (matching Google AI output)
Metric: Cosine similarity
Cloud: AWS us-east-1 (serverless)

Step 4: Computing Recommendations

For each blog, I query Pinecone to find the top-K most similar blogs (excluding itself).

def find_similar_blogs(slug, top_k=3):
    # Fetch this blog's embedding
    result = index.fetch(ids=[slug])
    query_embedding = result.vectors[slug].values
    
    # Search for similar
    results = index.query(
        vector=query_embedding,
        top_k=top_k + 1,  # +1 to exclude self
        include_metadata=True
    )
    
    # Filter and return
    return [
        {
            "slug": match.id,
            "title": match.metadata["title"],
            "score": match.score
        }
        for match in results.matches
        if match.id != slug
    ][:top_k]

Step 5: Pre-computing for Performance

Here's the clever part: I don't run ML inference at request time. Instead, I pre-compute all recommendations and export them as JSON.

def export_recommendations():
    recommendations = {}
    
    for blog in all_blogs:
        recommendations[blog["slug"]] = find_similar_blogs(blog["slug"])
    
    with open("data/recommendations.json", "w") as f:
        json.dump(recommendations, f)

The Next.js API route simply reads this file:

// /api/recommendations/[slug]/route.ts
export async function GET(request, { params }) {
    const data = JSON.parse(fs.readFileSync("recommendations.json"))
    return Response.json({
        recommendations: data.recommendations[params.slug]
    })
}

Why pre-compute?

Zero latency (just file read)
No API costs at runtime
Works on serverless/edge
Scales infinitely

The Results

Here's the similarity matrix for my first 3 blogs:

                              n8n Workflow  Blog System  MCP Servers
My First n8n Workflow              1.0000       0.6614       0.5866
How I Built Blog System            0.6614       1.0000       0.6361
Understanding MCP                  0.5866       0.6361       1.0000

The system correctly identified that:

"n8n Workflow" and "Blog System" are most similar (both about building things)
"MCP Servers" relates more to "Blog System" than "n8n" (both about developer tooling)

These connections make intuitive sense — the embeddings captured the essence of each blog.

The Frontend Component

A clean React component displays recommendations at the end of each blog:

export function BlogRecommendations({ currentSlug }) {
    const [recommendations, setRecommendations] = useState([])
    
    useEffect(() => {
        fetch(`/api/recommendations/${currentSlug}`)
            .then(res => res.json())
            .then(data => setRecommendations(data.recommendations))
    }, [currentSlug])
    
    return (
        <section>
            <h2>✨ You might also enjoy</h2>
            {recommendations.map(rec => (
                <Link href={`/blog/${rec.slug}`}>
                    <h3>{rec.title}</h3>
                    <span>{Math.round(rec.score * 100)}% match</span>
                </Link>
            ))}
        </section>
    )
}

Monthly Retraining

When I write new blogs, I simply run:

cd ml
python scripts/train.py

The entire pipeline — fetching, preprocessing, embedding, storing, computing — runs in about 15 seconds for 6 blogs. The JSON file updates, and the website automatically serves new recommendations.

What I Learned

Preprocessing is Crucial: Garbage in, garbage out. Spending time cleaning markdown, removing noise, and structuring input dramatically improved embedding quality.

Embeddings are Magic: The way modern embedding models capture semantic meaning is remarkable. Blogs about completely different topics but similar intent end up close in vector space.

Pre-computation Wins: For small-to-medium datasets, pre-computing everything offline is simpler, faster, and cheaper than real-time inference.

Integration is the Hard Part: Building the ML pipeline was one thing. Integrating it cleanly with a Next.js application, handling errors gracefully, and creating a good UX took just as much effort.

Why I Built Instead of Bought

I could have used a plugin. I could have called a recommendation API. But building from scratch gave me:

Deep understanding of how embeddings and vector search work
Full control over preprocessing and similarity logic
Portfolio proof that I can integrate ML with web applications
A story worth sharing (you're reading it!)