I Built "Legal Lens" — A Fine-Tuned AI That Translates Legal Jargon Into Plain English

The Problem

Legal contracts are intentionally confusing. Phrases like "In witness whereof, the parties hereto have executed this Agreement..." exist to protect lawyers, not people.

I set out to change that by fine-tuning an open-source LLM to simplify legal clauses into plain English.

This is the raw, unfiltered engineering journey. No shortcuts, real failures, and a final win.

Phase 1: The Dataset Problem

I started with the CUAD dataset (510+ real contracts from SEC filings). But here's the catch—CUAD only has complex clauses. There are no "simplified" versions.

You can't teach a model to simplify if you don't show it what "simple" looks like.

So I engineered a custom parallel corpus of 2,000 complex-to-simple legal clause pairs covering:

Termination
Indemnification
Confidentiality
Liability
And more

This was the foundation. Data quality over data quantity.

Phase 2: The Failures (This Is Where the Real Learning Happened)

My first attempt was with FLAN-T5. It was a disaster.

What Went Wrong:

NaN losses due to fp16 instability with T5 architecture:

# Training would just explode
Step 50: loss = NaN
Step 51: loss = NaN

Mode collapse — the model just repeated "This clause applies only to the parties" for every input, no matter what I fed it.

High eval_loss of 4.97 with completely gibberish output. The model wasn't learning anything meaningful.

I tried fixing:

Learning rates
Padding strategies
Deprecation errors (evaluation_strategy → eval_strategy, as_target_tokenizer removed)

Nothing worked well enough. The T5 architecture simply wasn't the right tool for this task.

Phase 3: The Pivot That Changed Everything

I switched to Google's Gemma-2B with a completely different approach:

The Setup:

4-bit Quantization (QLoRA) via bitsandbytes

Fit the entire 2.5B param model on a free Colab T4 GPU
Memory efficient, fast inference

LoRA adapters on q_proj and v_proj

Only 921,600 trainable params (0.037% of total!)
Full model stays frozen, adapters do the learning

SFTTrainer from TRL library with Gemma's native chat template

Clean training loop
Proper instruction formatting

Training config:

training_args = SFTConfig(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=True,
    optim="paged_adamw_8bit",
    logging_steps=25,
)

The Results

Training loss dropped beautifully: 3.58 → 0.77 in just 25 minutes.

Before:

"In witness whereof, the parties hereto have executed this Agreement as of the date first above written."

After (Model Output):

"The parties that are mentioned in this clause have done everything that is in the next clause."

It works. The model actually understands legal structure and translates it.

Key Engineering Lessons

1. Data Quality > Data Quantity

2,000 well-crafted pairs beat 10,000 generic ones. Every pair I created taught the model something specific about legal-to-plain translation.

2. Model Choice Matters Enormously

FLAN-T5 failed at this task. Gemma-2B nailed it. Architecture isn't just a detail—it's the foundation.

3. QLoRA Is a Game-Changer

Fine-tuning a 2.5B model on a free GPU? That's democratization of AI. No expensive cloud credits. No fancy hardware. Just smart engineering.

4. Fail Fast, Pivot Faster

The FLAN-T5 failures taught me more about NLP engineering than any tutorial. Understanding why something doesn't work is just as valuable as making something work.

Tech Stack

Python
Hugging Face Transformers
PEFT/LoRA
TRL (Transformer Reinforcement Learning)
bitsandbytes (quantization)
Google Colab (T4 GPU)

Try It Yourself

The full notebook is on GitHub: Legal Lens - Clause Simplifier

If you're working on LegalTech or domain-specific LLM fine-tuning, let's connect!

Legal jargon doesn't have to be a barrier. With the right approach, open-source models, and some persistence, we can make legal documents accessible to everyone.