01 — Experience 02 — Projects 03 — Tech Stack 04 — About 05 — Contact
Back to projects
PyTorch NLP Sequence Labelling

POS Tagging:
CRF vs Softmax

Part of speech tagging on Italian text using a 3 layer BiLSTM with FastText embedding, comparing a standard Softmax classifier against a Conditional Random Field decoder.

Year 2025
Type Academic
Status Completed
Project screenshot

Overview

Part of speech (POS) tagging is the task of assigning a grammatical label to every word in a sentence. It is one of the foundational NLP tasks: accurate POS tags inform nearly every downstream task from parsing to information extraction.

This project trains and compares two sequence labelling models on the Italian EAGLES corpus. Both share the same BiLSTM backbone and FastText word embeddings; the only difference is in the output layer. The first model uses a standard Softmax classifier treating each token independently at decode time. The second replaces this with a Conditional Random Field (CRF), which models tag-to-tag transition probabilities and decodes the full sequence jointly using the Viterbi algorithm.