Master's Thesis - HTR with Visual Language Model

In this project, I fine-tuned Visual Language Models (VLMs) and create an end-to-end pipeline for Handwrittent Text Recognitition tasks on historical Swedish manuscripts. The VLM-based pipeline is also compared against the classical YOLO - TrOCR pipeline. The project was done in collaboration with the Swedish National Archives (Riksarkivet). Models: Florence-2, YOLO, TrOCR Tools: PyTorch, transformers, Gradio GitHub Demo

July 14, 2025 · 1 min · Ha Pham

Kitchen Monitoring

This project simulates a monitoring system for kitchens, using YOLO for object detection and tracking. The detection model is trained to track two types of items (dish, tray), and each item type has a classification model to further categorize them into three sub-classes, depending on the content of the dish or tray. The system includes a module to perform inference, and a module for users to re-annotate the object detection themselves. ...

July 14, 2025 · 1 min · Ha Pham

Language Detector

In this project, I trained two models for the language detection task, using the WiLI-2018 dataset. Models: Naive Bayes, XLM RoBERTa Tools: sklearn, transformers, Streamlit, FastAPI, Docker. GitHub

July 14, 2025 · 1 min · Ha Pham

Music ETL

With this project, I aim to combine the real Million Song Dataset with data coming from Spotify API. With this pipeline, new researchers that are already familiar with the old Million Song Dataset to get new information for all the songs in Spotify, like new audio features, song popularity, artist popularity… Tools: Amazon S3, Redshift, EC2, Prefect, Terraform

July 15, 2025 · 1 min · Ha Pham