I had 14,000 photos sitting on a drive and wanted an excuse to play with local vision models and Elixir/Phoenix. I originally tried to get LLaVA to tell me if a photo was 'good' or matched my style, but quickly learned that LLMs have terrible taste. I ended up demoting the LLM to just extract metadata, and built a custom CLIP/Ridge Regression pipeline to actually learn my preferences based on how I rate things.
The stack is Phoenix/Oban on the orchestrator side, and Python/FastAPI/Instructor for the AI workers. Happy to answer any questions about the architecture, fighting with local RAW file ingestion, or the pains of Pydantic validation with open weights.
aleksuix•34m ago