I built an app that parses transcripts of political appearances to extract promises (defined as forward-looking statements that represent a commitment). Each promise is tracked with a timestamped link to the exact moment it was made.
I’m using pgvector and semantic analysis to group similar promises together, effectively identifying when the same idea is repeated. This allows me to generate a timeline showing how each promise has evolved over time.
A cron job updates the data nightly, uploading it to huggingface [1] and making it available for download [2]
The most interesting technical challenge was accurately parsing timestamps and capturing the surrounding context that gives meaning to each promise.
1: https://huggingface.co/datasets/jevon/buildcanada-2025/tree/... 2: https://2025.buildcanada.com/data