frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
166•isitcontent•9h ago•19 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
282•vecti•11h ago•127 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
227•eljojo•11h ago•142 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
12•denuoweb•1d ago•0 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
59•phreda4•8h ago•9 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
81•antves•1d ago•59 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
43•nwparker•1d ago•11 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•1h ago•0 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
15•NathanFlurry•16h ago•5 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•2h ago•4 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
147•bsgeraci•1d ago•61 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•13h ago•5 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
5•AGDNoob•5h ago•1 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•8h ago•1 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
23•dchu17•13h ago•11 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•6h ago•0 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•6h ago•0 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
171•vkazanov•1d ago•48 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•7h ago•0 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
11•KevinChasse•14h ago•11 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: Chiptune Tracker

https://chiptunes.netlify.app
3•iamdan•8h ago•1 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
9•sawyerjhood•14h ago•0 comments

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

https://github.com/SpOpsi/Project-Baver
2•solarV26•12h ago•0 comments

Show HN: Agentism – Agentic Religion for Clawbots

https://www.agentism.church
2•uncanny_guzus•12h ago•0 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
567•deofoo•5d ago•166 comments

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

https://github.com/BansheeTech/Disavow-Generator
5•SurceBeats•17h ago•1 comments

Show HN: Total Recall – write-gated memory for Claude Code

https://github.com/davegoldblatt/total-recall
10•davegoldblatt•1d ago•6 comments

Show HN: BPU – Reliable ESP32 Serial Streaming with Cobs and CRC

https://github.com/choihimchan/bpu-stream-engine
2•octablock•14h ago•0 comments
Open in hackernews

Show HN: Sparrow-1 – Audio-native model for human-level turn-taking without ASR

https://www.tavus.io/post/sparrow-1-human-level-conversational-timing-in-real-time-voice
123•code_brian•3w ago
For the past year I've been working to rethink how AI manages timing in conversation at Tavus. I've spent a lot of time listening to conversations. Today we're announcing the release of Sparrow-1, the most advanced conversational flow model in the world.

Some technical details:

- Predicts conversational floor ownership, not speech endpoints

- Audio-native streaming model, no ASR dependency

- Human-timed responses without silence-based delays

- Zero interruptions at sub-100ms median latency

- In benchmarks Sparrow-1 beats all existing models at real world turn-taking baselines

I wrote more about the work here: https://www.tavus.io/post/sparrow-1-human-level-conversation...

Comments

orliesaurus•3w ago
Literally no way to sign up to try. Put my email and password and it puts me into some wait list despite the video saying I could try the model today. That's what makes me mad about these kind of releases is that the marketing and the product don't talk together.
qfavret•3w ago
try signing up for the API platform on the site. You can access it there
nubg•3w ago
Any examples available? Sounds amazing.
bpanahij•3w ago
Try out the PALs: they all use Sparrow-1. You can try Charlie on Tavus.io on the homepage in one of the retro retro-styled windows there.
nextaccountic•3w ago
> Non-verbal cues are invisible to text: Transcription-based models discard sighs, throat-clearing, hesitation sounds, and other non-verbal vocalizations that carry critical conversational-flow information. Sparrow-1 hears what ASR ignores.

Could Sparrow instead be used to produce high quality transcription that incorporate non-verbal cues?

Or even, use Sparrow AND another existing transcription/ASR thing to augment the transcription with non-verbal cues

bpanahij•3w ago
This is a very good idea. We currently have a model in our perception system (Raven-1) that performs this partially. It uses audio to understand tone and augment the transcription we send to the conversational LLM. That seems to have an impact on the conversational style of the replicas output, in a good way. We’re still evaluating that model and will post updates when we have better insights.
randyburden•3w ago
Awesome. We've been using Sparrow-0 in our platform since launch, and I'm excited to move to Sparrow-1 over the next few days. Our training and interview pre-screening products rely heavily on Tavus's AI avatars, and this upgrade (based on the video in your blog post) looks like it addresses some real pain points we've run into. Really nice work.
bpanahij•3w ago
That’s great! I also built Sparrow-0, and Sparrow-1 was designed to address Sparrow-0’s shortcomings. 1 is a much better model, both in terms of responsiveness and patience.
dfajgljsldkjag•3w ago
I am always skeptical of benchmarks that show perfect scores, especially when they come from the company selling the product. It feels like everyone claims to have solved conversational timing these days. I guess we will see if it is actually any good.
fudged71•3w ago
Different industry, but our marketing guy once said "You know what this [perfect] metric means? We can never use it in marketing because it's not believable"
khalic•3w ago
Just include some noise, it’s like the most available resource in the universe
drob518•3w ago
Never thought of noise as a resource, but yea.
bpanahij•3w ago
You should be skeptical, and try it out. I selected 28 long conversations for our evaluation set, all unseen audio. Every turn taking model makes tradeoffs, and I tried to make the best tradeoffs for each model by adjusting and tuning the implementations. I’m certainly not in a position as the creator of Sparrow to be totally objective. However we did use unaltered real conversational audio to evaluate. I tried to find examples that would challenge Sparrow-1 with lots of variation in speaker style across the conversations.
cuuupid•3w ago
The first time I met Tavus, their engineers (incl Brian!) were perfectly willing to sit down and build their own better Infiniband to get more juice out of H100s. There is pretty much nobody working on latency and realtime at the level they are, Sparrow-1 would be an defining achievement for most startups but will just be one of dozens for Tavus :)
lostmsu•3w ago
> perfectly willing

dreaming

bpanahij•3w ago
Maybe infiniband is a bit more than we can handle. That technology is incredible! You are right though, we have been willing to build things we needed that didn’t exist yet, or were not fast enough or natural enough. Sparrow-1, Raven-1, and Phoenix-4 are all examples that, and we have more on the way.
ttul•3w ago
I tried talking to Claude today. What a nightmare. It constantly interrupts you. I don’t mind if Claude wants to spend ten seconds thinking about its reply, but at least let ME finish my thought. Without decent turn-taking, the AI seems impolite and it’s just an icky experience. I hope tech like this gets widely distributed soon because there are so many situations in which I would love to talk with a model. If only it worked.
mavamaarten•3w ago
Agreed. English is not my native language. And I do speak it well, it's just that sometimes I need a second to think mid-sentence. None of the live chat models out there handle this well. Claude just starts answering before I've even had the chance to finish a sentence.
Tostino•3w ago
English is my native language, and I still have this problem all the time with voice models.
sigmoid10•3w ago
Anthropic doesn't have any realtime multimodal audio models available, they just use STT and TTS models slapped on top of Claude. So they are currently the worst provider if you actually want to use voice communication.
code_brian•3w ago
It's unfortunate though, because Anthropic LLMs and ecosystem is the best IMHO. Tavus (we) and Anthropic should form a partnership.
sigmoid10•3w ago
I think Anthropic currently has a slight edge for coding, but this is changing constantly with every new model. For business applications, where tool calling and multi-modality matter a lot, OpenAI is and always has been superior. Only recently Google started to put some small dents in their moat. OpenAI also has the best platform, but less because it is good and more because Google and Anthropic are truly dismal in every regard when it comes to devx. I also feel like Google has accrued an edge in hard-core science, but that is just a personal feeling and I haven't seen any hard data on this yet.
MrDunham•3w ago
I love Anthropic's models but their realtime voice is absolutely terrible. Every time I use it there is at least once that I curse at it for interrupting me.

My main use case for OpenAI/ChatGPT at this point is realtime voice chats.

OpenAI has done a pretty great job w/ realtime (their realtime API is pretty fantastic out of the box... not perfect, but pretty fantastic and dead simple setup). I can have what feels like a legitimate conversation with AI and it's downright magical feeling.

That said, the output is created by OpenAI models so it's... not my favorite.

I sometimes use ChatGPT realtime to think through/work through a problem/idea, have it create a detailed summary, then upload that summary to Claude to let 4.5 Opus rewrite/audit and come up with a better final output.

code_brian•3w ago
I use Claude Code for everything, and I love Anthropic's models. I don't know why, but it wasn't until reading this that I realized: I can use Sparrow-1 with Anthropic's models within CVI. Adding this to my todo list.
butlike•3w ago
Am I not allowed to cut you off if you're ramble-y and incoherent?
BizarroLand•3w ago
Its rude if you're a human, and entirely unacceptable if you are a computer.
code_brian•3w ago
The one thing that really surprised me, the thing I learned that's affected my conversational abilities the most: turn taking in conversation is a negotiation: there are no set rules. There are protocols: - bids - holds / stays - implications (semantic / prosodic)

But then the actual flow of the conversation is deeply semantic in the best conversations, and the rules are very much a "dance" or a negotiation between partners.

BizarroLand•3w ago
That's an interesting way to think about it, I like that.

It also implies that being the person who has something to say but is unable to get into the conversation due to following the conversational semantics is akin to going to a dance in your nice clothes but not being able to find a dance partner.

code_brian•3w ago
Yeah, I can relate to that. Maybe it's also because you are too shy to ask someone to dance. I think I learned that lesson: just ask, and be unafraid to fail. Things tend to work themselves out. Much of this is experimentation. I think our models need to be open to that: which is one cool thing about Sparrow-1: it's a meta-in-context learner. This means that when it try's and fails, or you try and fail, it learns at runtime to adapt.
Taikonerd•3w ago
Agreed. I tried using Gemini's voice interface in their app. It went like this:

===

ME: "OK, so, I have a question about the economics of medicine. Uh..." [pauses to gather thoughts to ask question]

GEMINI: "Sure! Medical economics is the field of..."

===

And it's aggravated by the fact that all the LLMs love to give you page-long responses before it's your turn to talk again!

mentalgear•3w ago
Metric | Sparrow-1 Precision 100% Recall 100%

Common ...

reubenmorais•3w ago
If you watch the demo video you can see how they would get this: the model is not aggressive enough. While it doesn't cut you off, which is nice, it also always waits an uncanny amount of time to chime in.
oersted•3w ago
That should lead to a low recall: too many false negatives. I wonder how they are calculating it.
bpanahij•3w ago
The response timing in the chart in the blog post shows that even with perfect precision/recall Sparrow-1 also has the fastest true positive response times.

The turn taking models were evaluated in a controlled environment with no additional cascaded steps: LLM, TTS, Phx. This matters to get apples to apples comparison: without the rest of the pipeline variability influencing the measurements.

The video conversation examples are sparrow-1 within the full pipeline. These responses aren’t as fast as sparrow itself because the LLM, TTS, facial rendering, and network transport also take time. Without Sparrow-1 they would be slower. Sparrow-1 enables the responses being as fast as they are, and with a faster CVI pipeline configuration the responses can be as fast as 430ms in my testing.

krautburglar•3w ago
Such things were doing a good-enough job scamming the elderly as it is--even with the silence-based delays.
bpanahij•3w ago
That’s unfortunate and certainly not what I spend my time dreaming about. My favorite use case for the elderly is as a sort of companion for sharing their story for future generations. One of our partners uses our technology to help elderly. But yeah, this kind of technology makes AI feel more natural, so we should be aware of that and make sure it’s used for good.
sourcetms•3w ago
How do I try the demo for Sparrow-1? What is pricing like?
bpanahij•3w ago
You can try Sparrow-1 with any of our PALs, or by signing up for a developer account.
ljoshua•3w ago
Hey @code_brian, would Tavus make the conversational audio model available outside of the PALs and video models? Seems like this could be a great use case for voice-only agents as well.
code_brian•3w ago
You can reach out to our sales team. You can chat with our AI SDR here, and they will review it and reach out. https://www.tavus.io/demo
allan_s•3w ago
How does it compare with https://github.com/KoljaB/RealtimeVoiceChat , which is absent of the benchmark ?
bpanahij•3w ago
I haven’t tried that one yet, I’ll check it out.
sippeangelo•3w ago
That's not a turn-taking model, it's just a silence detection Python script based on whatever text comes out of Whisper...
nubg•3w ago
Btw while I think this is cool and useful for real time voice interfaces for the general populace, I wonder if for professional users (eg a dev coding by dictating all day), a simple push to talk is not always going to be superior, because you can make long pauses while you think about something, this would creep out a human, but the AI would wait patiently for your push to talk.
bpanahij•3w ago
As a dev myself, I see a couple of modes of operation: - push to talk - long form conversation - short form conversation

In both conversational approaches the AI can respond with simple acknowledgements. When prompted by the user the AI could go into longer discussions and explanations.

It might be nice for the AI to quickly confirm it hears me and for it to give me subtle queues that it’s listening: backchannels: “yeah”, and non-verbal: “mhmm”. So I can imagine having a developer assistant that feels more like working with another dev than working with a computer.

That being said, there is room for all modes, all at the same time, and at different times shifting between them. A lot of time I just don’t want to talk at all.

vpribish•3w ago
What is "ASR" - automatic speech recognition?
code_brian•3w ago
Ah good question: Yes, ASR stands for Automatic Speech Recognition.
pugio•3w ago
It sounds really cool, but I don't see any way of trying the model directly. I don't actually want a "Persona" or "Replica" - I just want to use the sparrow-one model. Is there any way to just make API calls to that model directly?
arkobel•2w ago
Have you compared with Krisp-TT models? https://krisp.ai/blog/krisp-turn-taking-v2-voice-ai-viva-sdk... Krisp LLC also shares an End-of-Turn Test dataset. Did you test your model on that? https://huggingface.co/datasets/Krisp-AI/turn-taking-test-v1

And can you share some information about the model size and FLOPS?