
Master's Thesis - HTR with Visual Language Model
In this project, I fine-tuned Visual Language Models (VLMs) and create an end-to-end pipeline for Handwrittent Text Recognitition tasks on historical Swedish manuscripts. The VLM-based pipeline is also compared against the classical YOLO - TrOCR pipeline. The project was done in collaboration with the Swedish National Archives (Riksarkivet). Models: Florence-2, YOLO, TrOCR Tools: PyTorch, transformers, Gradio GitHub Demo