I’m a solo engineer working on a project that came out of frustration with traditional camera systems.
Most cameras answer one question well: “Did something move?” But many practical needs are more semantic:
Is the door open?
Is someone inside the room?
Is a package still there?
Did the light turn off?
So I built a system where you define a natural-language question, connect a camera (webcam / RTSP / edge device), and get notified only when the answer becomes “yes.”
Some high-level design choices:
The system evaluates questions rather than raw motion
It avoids continuous video streaming and processes frames only when needed
Processing can run on edge devices for privacy
The cloud side only sees what’s required to answer the question
This is still early but technically functional. What I’m trying to understand now is where this is genuinely useful, beyond obvious demos.
I’d really appreciate feedback on:
Real-world use cases where this would actually be trusted
Failure modes you’d expect in non-ideal environments
Whether this feels more like infra, a product, or a developer tool
Anything that seems fundamentally flawed in the approach
Happy to answer technical questions if appropriate.