From N-Grams to Micro-Models: Training AI on a 1080 Ti

November 24, 2025 at 10:15 AM CST • Wayne Workman • 5 min read

Years ago, before I knew what a "tokenizer" was, I wrote a script called sentence-finish. It looked at the last three words of a sentence to predict the fourth. I thought I had invented magic. Later while watching an educational video I realized I had independently discovered n-grams.

My plan at the time was to improve my script by moving from full words to tokens using Google SentencePiece. But technology moved faster than I did and I looked up one day to find the world had moved on to Generative AI.

So I decided to catch up.

Heres the devlog video glimpse into my latest explorations:

Important Note

As always, this project used personal time, personal laptop, personal AWS accounts, personal resources, personal money, and personal oxygen. This is a personal endeavor conducted entirely during private time using personal resources, completely independent of and unrelated to professional employment.

The Rig: Frankenstein Computing

I recently retired my daily driver, an ancient Quadro 6000. That card was a nightmare of 400 cores, low RAM, and driver hell. I had to use Ubuntu 20 and an archived version of Steam just to make it function which was frustrating but I learned alot about GPU drivers in the process.

Ive upgraded to a GeForce 1080 Ti. Its old but with 11GB of VRAM and 3,500+ CUDA cores its a powerhouse compared to what I had. It lives inside a Dell Optiplex 9020 with a custom power adapter and mismatched RAM (waiting on the upgrade!).

It requires older drivers and a specific PyTorch stack (torch 2.7.1+cu118), but it works.

The Project: A WWII Micro-Model

Im building a model from scratch. Not fine-tuning, not RAG, but training from scratch focused entirely on World War II.

Architecture: ~43 Million parameters, 512 context size
The Data: Wikipedia dumps processed via wikiextractor (running in a Python 3.10 Docker container to keep it alive)

The Challenge: Time Travel

Early in training I noticed the model hallucinating facts about people born during WWII but whose careers happened in the 80s. The model was connecting related entities across time periods which made for some interesting but completely wrong outputs. I had to write a post-filtering system to parse dates and weight articles by frequency, strictly gating content to pre-WWII, the war years, and the immediate aftermath.

The loss immediately dropped to 2.2.

The Teacher: Llama 3.2

To push the model further I needed a massive Question:Answer dataset. Instead of writing it manually I turned to Llama 3.2 (3B).

I benchmarked Llama 3.2 against Phi-3, Gemma-2, Qwen-2.5, and Flan-T5. For my specific hardware and context needs Llama was the clear winner in speed and quality. Im now using it to generate thousands of synthetic Q&A pairs to teach my micro-model how to chat.

Why Do This From Scratch?

I could have just fine-tuned Llama. I could have built a RAG system. It would have been easier.

But I wanted to learn.

Going from hacking together n-grams to managing a modern LLM workflow has been like drinking from a firehose and my only regret is not diving into Hugging Face and this ecosystem sooner because the tools available now make things possible that would have seemed like science fiction just a few years ago. We have the tools and the AI assistance (shout out to Claude Code) to build anything. Everything is possible now.

← Back to Blog