01 — Experience 02 — Projects 03 — Tech Stack 04 — About 05 — Contact
Back to projects
Python NLP Language Modelling

N-Gram Language Model

A from scratch n-gram language model trained on the Lord of the Rings trilogy. Built up from bigrams to a full backoff model with Kneser-Ney smoothing, and deployed as an interactive text generator.

Year 2026
Type Personal
Status Completed
Project screenshot

Overview

This project is a Python implementation of n-gram language models, trained on the Lord of the Rings trilogy. Starting from the simplest version, a bigram, the project builds up throug n-grams, then layers on three important improvements: Pruning, Backoff and Kneser-Ney smoothing.

The end result is a text generator that produces new Tolkien like sentences. You can try it yourself at Ngram.