Hi HN! My name's Kiern and I've been working on Leilani (https://leilani.dev), a platform for connecting your PBX to real-time AI using OpenAI's real-time API.
What I settled on was building what is essentially a softphone. Leilani connects to your PBX with a SIP username and password, you are then free to use that extension however, you want.
Leilani comes with prebuilt integrations for things like ticket creation and calendar scheduling, (with more vendor support coming soon) but you can also create custom functions that can fetch data over HTTP. Leilani also supports RAG out of the box.
You can setup Leilani in literally under a minute, it takes more time to create the OpenAI API key, I'm not being facetious.
Another benefit of this approach that builders appreciate is just how SMALL you can make things. If you want an extension that you can call and it just tells you the weather, you can build just that. Most commercial solutions require lots of spend and demos and yadayadayada.
How It Works!
The back end is written in Rust, with a custom (very minimal) SIP implementation. When we get a SIP INVITE, we setup the media which will later stream audio to and from OpenAI's real-time API via WebSocket’s. After we get the ACK request to our OK response, we start the media. Leilani handles the logic layer and the bridging of protocols so you can build cool stuff fast.
You can monitor calls via live transcription in the UI, or by using your PBX's existing features for call monitoring. Remember, it's just an extension!
The RAG works by simply uploading a file either in the UI, or by connecting to the exposed WebDAV server and uploading/syncing files there.
Looking forward to feedback and discussion!