It got me thinking it'd be cool to track this somehow, so I built a website! I am taking a sidewalk livestream, feeding it into a YOLO model for people tracking, then sending a frame of each detected person to Gemini 2.0 Flash, which returns structured JSON about each person's clothing and if they're holding an umbrella. I also had fun making the site look like a TV weather channel.
I showed some friends this project and someone mentioned how the legendary Tasks xkcd comic (https://xkcd.com/1425) is out of date now. If you want to check whether a photo has birds in it (or if someone is holding an umbrella), you can just ask an inexpensive vision model for JSON.
cgsmith•11h ago