For the past few weekends, I have been working on a computer vision pipeline to solve a specific PropTech problem: turning messy, highly occluded 2D floor plans into clean, structured data for 3D extrusion. I originally built this as a demo for a firm.
The Stack & Architecture I built an instance segmentation pipeline that relies strictly on pixel-perfect masking to extract the geometry.
The Backbone: Swin Transformer + Detectron2 + OpenCV Training: Trained on 1024x1024 images using an RTX 4090. Inference: Runs on CPU in < 10s. Demo Performance: 67.1% AP50 for instance segmentation masks, and 38.2% AP across the strict 0.50:0.95 IoU thresholds.
Why I'm posting: A lot of virtual staging and architectural startups have beautiful Three.js rendering engines, but still rely on manual data entry to build the base geometry. I built this specifically as an extraction engine to sit underneath those UIs.
If you are in PropTech or building a product that could benefit from embedding this model under the hood, I would love to chat.