frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

What's That? – Photo to personalized audio narrative in under 10 seconds

https://apps.apple.com/us/app/whats-that-ai-audio-guide/id6756409506
1•whatsthatapp•1h ago

Comments

whatsthatapp•1h ago
I've been working on a pipeline that chains vision analysis, parameterized narrative generation, and TTS into a single flow that completes in under 10 seconds (~2s vision, ~5s generation streamed, ~3s TTS). Shipped it as an iOS app.

The pipeline:

1. *Vision analysis* identifies the subject (landmark, artwork, food, signage, museum panel, etc.) and extracts contextual details from the image.

2. *IPOP parameterization* selects the narrative angle. IPOP (Ideas, People, Objects, Physical) is adapted from the Smithsonian's visitor engagement research, which found museum visitors cluster into four types by what draws their attention. Users set their IPOP dimension weights in the app. A user weighted toward "People" gets the story of the craftsman who built a building. A user weighted toward "Ideas" gets the political history of the same building from the same photo.

3. *Narrative generation* produces ~90 seconds of spoken text via LLM. The system uses SSE streaming so text renders on the client while generation is still running. The IPOP weights get injected into the system prompt alongside the vision output.

4. *Non-repetition via session context.* This was the hardest part. If someone photographs three churches in a row, the system needs to find a genuinely different angle each time. The approach: maintain a sliding summary of prior outputs in the session. Before generating, the system checks which IPOP dimensions and narrative angles have already been used, then rotates to an unexplored dimension. So church #1 might get the political history, church #2 gets the story of the stonemason, church #3 gets the acoustic design. Without this, you get "built in 1342, baroque style" on repeat.

5. *TTS* converts the narrative to audio with selectable voice models. Audio generates in the background while the user reads the streamed text.

There's no pre-built content database. The system generates from the image and user profile at request time, which means it handles subjects it's never encountered before, though quality degrades for very obscure or poorly lit subjects.

The IPOP dimension system is the part I'm least sure about. Four dimensions felt right based on the Smithsonian research, but I'm curious whether finer granularity (splitting "Ideas" into "historical" vs "conceptual," for example) would produce meaningfully different outputs or just add noise.

iOS. https://whats-that.app

Show HN: Homecastr - AI home price forecasts on a map

https://www.homecastr.com/
1•dhardestylewis•47s ago•0 comments

Show HN: DevNode.studio, 100% local dev tools to make back end work faster

https://www.devnode.studio/
1•nyosef26•48s ago•1 comments

Brex tests agents: by committing fraud

https://www.brex.com/journal/articles/simulation-testing-ai-audit-agent
2•brandonbloom•5m ago•0 comments

Cryo FAQ

https://notebook.ldeming.com/whyilovecryo/
2•sebg•5m ago•0 comments

AI Slop: A Slack API Rate Limiting Disaster

https://code.dblock.org/2026/03/12/ai-slop-a-slack-api-rate-limiting-disaster.html
1•dblock•6m ago•0 comments

"You're Right"- What if you gave a web dev from 2006 Claude Code?

https://wiredsis.medium.com/youre-absolutely-right-2f4281e0f950
1•chess•6m ago•0 comments

Illinois introduces OS-level age verification law

https://legiscan.com/IL/bill/SB3977/2025
7•rickcarlino•7m ago•0 comments

Sam Altman Says Intelligence Will Be a Utility

https://gizmodo.com/sam-altman-says-intelligence-will-be-a-utility-and-hes-just-the-man-to-collec...
2•cdrnsf•9m ago•1 comments

Now Is the Time to Eat Their Lunch

https://rodyne.com/?p=3875
1•boznz•9m ago•1 comments

Tanker Sea Owl I Boarded in the Baltic Sea

https://polisen.se/aktuellt/nyheter/nationell/2026/mars/tanker-sea-owl-i-boarded-in-the-baltic-sea/
1•madspindel•10m ago•0 comments

A 4 byte file can bypass permissions in a GraphQL package used for payments

https://medium.com/@caplanmaor/prototype-pollution-in-graphql-upload-minimal-cve-2025-65587-a8648...
1•BambaNugat•10m ago•1 comments

Code Quality in the Age of Coding Agents

https://michaeltimbs.me/blog/code-quality-in-the-age-of-coding-agents/
1•alpaylan•10m ago•0 comments

Ask HN: Does code style matter much anymore?

2•travisgriggs•14m ago•1 comments

DirectX: Bringing Console-Level Developer Tools to Windows

https://devblogs.microsoft.com/directx/directx-bringing-console-level-developer-tools-to-windows/
3•haunter•14m ago•0 comments

DIY: Enigma Machine from a Toilet Paper Tube

https://www.flyingpenguin.com/build-your-own-mini-enigma-machine-from-a-toilet-paper-tube/
1•rolph•15m ago•0 comments

Jones Act Enforcer

https://offshoremarine.org/page/JonesActEnforcer
1•signorovitch•15m ago•1 comments

Turn your best X posts into a portfolio people can browse

https://curio-brown.vercel.app
2•NachikethRamesh•16m ago•0 comments

Women of the Flemish Golden Age

https://news.artnet.com/art-world/meet-the-forgotten-women-of-the-flemish-golden-age-2751227
2•petethomas•16m ago•0 comments

Show HN: Stratum – SQL that branches and beats DuckDB on 35/46 1T benchmarks

https://datahike.io/notes/stratum-analytics-engine/
3•whilo•19m ago•1 comments

Show HN: I rebuilt the "similar movies/TV shows" algorithm on TasteFinder

https://tastefinder.io/
2•tastefinder_io•23m ago•0 comments

Killing the Serialization Tax: 1M Entity Ingestion in 11.8µs with C#

https://intelligentaudio.net/nexus-pulse
1•NexusCore•23m ago•1 comments

Safari web browser bugs: A year in review

https://lapcatsoftware.com/articles/2026/3/6.html
1•zdw•24m ago•0 comments

San Francisco is awesome. It could be much better

https://faingezicht.com/articles/2026/03/12/san-francisco/?src=hn
2•avyfain•24m ago•0 comments

Show HN: Codelegate, keyboard-driven coding agent orchestrator GUI for Mac/Linux

https://codelegate.dev/
2•brucehsu•25m ago•0 comments

Infisical in 60 Seconds

https://infisical.com/videos/infisical-in-60-seconds
1•vmatsiiako•25m ago•0 comments

Why Moltbook and OpenClaw are the fool's gold in our AI boom

https://www.zdnet.com/article/moltbook-and-openclaw-fools-gold-in-ai-boom/
1•CrankyBear•25m ago•0 comments

Shall I implement it? No

https://gist.github.com/bretonium/291f4388e2de89a43b25c135b44e41f0
6•breton•27m ago•0 comments

Show HN: Firstrun – Turn static documentation into interactive walkthroughs

https://firstrun.dev
1•mhamda•32m ago•0 comments

AI error jails innocent grandmother for months in North Dakota fraud case

https://www.grandforksherald.com/news/north-dakota/ai-error-jails-innocent-grandmother-for-months...
72•rectang•33m ago•30 comments

Source code of Swedish e-govt services from CGI's "E-plattform" has been leaked

https://twitter.com/IntCyberDigest/status/2032171171798565311
1•toss1•34m ago•0 comments