This is an SDK you can put before your STT. It lets you know when your device is being spoken to or not without a wakeword. You can use it for: -Single AI, Multi human -Multi AI, Single human -Multi AI, Multi human (we recommend also adding a wakeword on top for a better system)
There are two models. One that is video + audio and one that is just audio. The way it overall works is that it looks for shifts in attention patterns (body language changes, vocal patterns) to work. It's a tough problem to nail as every human being is different in how they interact with people/devices.
Let me know how it is!