frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

We built audio/video RAG

https://www.ragie.ai/blog/how-we-built-multimodal-rag-for-audio-and-video
5•mkauffman23•6h ago

Comments

mkauffman23•6h ago
In this blog we detail the api design and technical decisions we made when adding audio video support to Ragie's RAG service. We explore some of the approaches we tried and the rationale behind what we landed on. Worth a read if you're building similar systems.

Here's a TLDR: - Built a full pipeline that processes audio/video → transcription + vision descriptions → chunking → indexing - Audio: faster-whisper with large-v3-turbo (4x faster than vanilla Whisper) - Video: Chose Vision LLM descriptions over native multimodal embeddings (2x faster, 6x cheaper, better results) 15-second video chunks hit the sweet spot for detail vs context - Source attribution with direct links to exact timestamps

Happy to answer any further questions folks might have!

bobremeika•6h ago
Source attribution with direct links to exact timestamps is truly unique when it comes to A/V RAG solutions.
rdegges•5h ago
Great article. This may be my all-time favorite deep dive post on RAG strategies.

It’s super interesting to me how the process of fully making audio/video searchable requires so much processing. Like, extracting the audio and video, transcribing the audio, chunking the video into 15-sec scenes and describing them visually, etc.

I wonder if as a test you could use the video descriptions, run them as a prompt through something like Veo, then stitch them together into something close to the original. Wild.

mkauffman23•4h ago
I have no idea how accurate the reconstruction would be but it would make for a wild experminent!

Bookmer.com launched Browser extention for Chrome

https://chromewebstore.google.com/detail/bookmer-launcher/mladlmojookmijmdcdabepbcefjokhfi
1•g_briel•2m ago•0 comments

Show HN: I built BodyCount to track my 'score' but found deeper meaning

https://app.bodycount.love/
1•dsstudios•2m ago•0 comments

Rest in Peace Ozzy

1•quicon•5m ago•0 comments

New Duke Study Finds Obesity Rises with Caloric Intake, Not Couch Time

https://trinity.duke.edu/news/new-duke-study-finds-obesity-rises-caloric-intake-not-couch-time
1•ivewonyoung•5m ago•0 comments

Morse Code

https://kmcd.dev/posts/morse/
1•ingve•7m ago•0 comments

Show HN: How Claude Code Improved My Dev Workflow

1•IgorGanapolsky•7m ago•0 comments

Diffusion Beats Autoregressive in Data-Constrained Settings

https://arxiv.org/abs/2507.15857
1•badmonster•8m ago•1 comments

Liking Yellow Imply Driving a School Bus? Semantic Leakage in LLMs

https://arxiv.org/abs/2408.06518
1•Bluestein•8m ago•0 comments

When Existence is Inefficient (2022)

https://inference-review.com/article/when-existence-is-inefficient
1•aleph_minus_one•12m ago•0 comments

Comment with your favorite local-first content

https://lofi.so/mentions
2•yonz•15m ago•2 comments

The average Apple Watch user gets 49 minutes of deep sleep per night

https://www.empirical.health/blog/apple-watch-deep-sleep-meaning/
2•brandonb•19m ago•0 comments

Windows 11 gets new Black Screen of Death, auto recovery tool

https://www.bleepingcomputer.com/news/microsoft/windows-11-gets-new-black-screen-of-death-auto-recovery-tool/
2•DocFeind•19m ago•0 comments

China begins building largest dam, fuelling fears in India

https://www.bbc.com/news/articles/c4gk1251w14o
1•perihelions•22m ago•0 comments

Show HN: How Claude Code Improved My Dev Workflow

4•IgorGanapolsky•24m ago•1 comments

Despite deepfake audio tech, banks, ISPs push voice print authentication (2021)

https://keydiscussions.com/2021/12/07/despite-the-prevalence-of-deepfake-audio-tech-banks-and-isps-rush-ahead-with-voice-print-authentication-%f0%9f%92%80/
2•spenvo•25m ago•1 comments

The dangers of Musk's new, Manga-style [flirty] chatbot [video]

https://www.youtube.com/shorts/17rkMuExdPI
5•mdp2021•27m ago•2 comments

Qwen3 – Coder

https://old.reddit.com/r/LocalLLaMA/comments/1m6mew9/qwen3_coder/
2•mircea•28m ago•2 comments

Vector Tiles are deployed on OpenStreetMap.org

https://blog.openstreetmap.org/2025/07/22/vector-tiles-are-deployed-on-openstreetmap-org/
3•ikawe•31m ago•0 comments

How Silicon Valley is becoming militarized

https://english.elpais.com/economy-and-business/2025-07-21/big-tech-enters-the-war-business-how-silicon-valley-is-becoming-militarized.html
2•geox•32m ago•0 comments

Show HN: How Claude Code Improved My Dev Workflow

2•IgorGanapolsky•37m ago•0 comments

Checklist Genie – Create Sharable Checklists with Just Your Voice and AI

https://checklistgenie.app
1•alohaplannerapp•39m ago•1 comments

Qwen3-Coder: Agentic Coding in the World

https://qwenlm.github.io/blog/qwen3-coder/
5•danielhanchen•39m ago•1 comments

Ask HN: A Reddit UI where all writing is done by an AI?

1•amichail•39m ago•2 comments

Show HN: A CLI tool for creating Typst screenplay projects

https://github.com/ChaseRensberger/typstscript
1•ChaseRensberger•41m ago•0 comments

Hackers Behind $140M Brazil Banking Heist Turn to Crypto to Launder Their Loot

https://www.coindesk.com/business/2025/07/04/hackers-behind-usd140m-brazil-banking-heist-turn-to-crypto-to-launder-their-loot
2•PaulHoule•42m ago•0 comments

RFC 1392: Internet Users' Glossary

https://www.rfc-editor.org/rfc/rfc1392.html
3•adtac•42m ago•1 comments

A power utility is reporting suspected pot growers to cops. EFF says illegal

https://arstechnica.com/tech-policy/2025/07/eff-moves-to-stop-power-utility-reporting-suspected-pot-growers-to-cops/
6•duxup•42m ago•1 comments

SmoothCSV: The Ultimate CSV Editor

https://smoothcsv.com
4•msephton•42m ago•1 comments

Ask HN: Can You Buy Your Way into Your Dream Job?

3•YoloVibes•44m ago•5 comments

SWE-Bench Verified Is Flawed Despite Expert Review

https://ddkang.substack.com/p/swe-bench-verified-is-flawed-despite
2•yuxuan18•46m ago•0 comments