frontpage.

Show HN: Octopoddy – iOS Podcast App Using Transcripts and LLMs to Skip Ads

https://apps.apple.com/us/app/octopoddy-alpha/id6753860890

1•spellbind-dare•1h ago

TL;DR I'm a fan of podcasts and I despise ads. I built an iOS app to detect and skip in audio ad content.

Motivation: I love podcasts, especially multi hour ones that go into detail on niche topics. One thing that puts me off some podcasts is having the flow become interrupted, especially mid sentence by dynamically inserted ads. Last year this led me down a rabbit hole of experimenting with removing ads from the podcasts I listen to.

Experimentation: At first I tried using Whisper for ad detection to generate a transcript of an episode, then feed this to ChatGPT and ask it to find the ad timestamps. This worked surprisingly well. I had a proof of concept working but I wanted something I could actually use on my phone.

Productionization: Next I wondered if I were to productionize my prototype how would I do that? From experimenting the main issue would be volume of audio transcription required to satisfy a moderate-heavy podcast listener. I estimated ~100 hours of audio per month would be required per user. In testing I used OpenAI hosted Whisper, charged at $0.006/min. That sounds quite cheap. What would that be for 100 hours? $0.006/min -> $0.36/hour -> $36.00/100 hours. S%#t. $36/user/month is way too expensive. If you were to turn this into a business at that rate you'd probably need to be charging at least $50/month. No one's going to pay that.

What if we did everything on device? In iOS 26 there are APIs for on device speech LLM and speech to text. I got the podcast audio -> transcript -> LLM detected ad segment pipeline working. Excellent! The next problem is that an iPhone is not a data center grade GPU. The pipeline was significantly slower than my first attempt. Before it would take <= ~4 mins while the iPhone pipeline could take up to 10 minutes for multi hour podcasts. The on device approach would be too slow of a good UX. Not to mention each time I ran a test run of the iPhone based pipeline my phone would get really hot and be a huge drain on the battery.

Back to square one. The only other approach (at least that I could think of) would be to manage the transcription infra myself. Given this is just a side project I wanted simple infra. Ideally I would be able to use something like AWS Lambda with GPUs (does not exist, I checked). My research showed GCP has serverless Cloud Run with a GPU option. Now we were starting to cook. I built a spike with GCP and had the ad detection working. As I was starting to get excited, I ran a load test on Cloud Run revealing a new problem.

GPUs are in hot demand. Who knew? GCP is (or at least was) limiting the number of GPUs per customer. I was only allotted ~3 GPUs to my account (I tried raising a support ticket for a higher limit but no luck). This was a huge bottleneck as transcribing an episode would saturate the resources of one GPU so 3 GPUs is only a pitiful 3 concurrent episodes being transcribed at once :(

Further into the rabbit hole, research led me to find Runpod that has a serverless GPUs. The low end GPUs go for ~$0.50/hour (that's hour of GPU time not audio transcribed) depending on the GPU used. Now with more reliable access to enough GPUs I could run a load test again. It worked out to be ~$0.02/hour or ~$2.00/100 hours of audio transcribed. At $2 per user per month this is looking a lot more reasonable. $2 is a 94% decrease compared to using the OpenAI API at $36. To be fair to OpenAI the transcripts I would get from their API would be more accurate. When tuning the Runpod implementation I was optimizing for speed and low cost. For the ad detection I found if the transcript was a bit less accurate this did not matter too much when getting the LLM to pick out the ad segments, so trading accuracy for speed + cost made sense here.

Anyway that is my story of building the Octopoddy ad detection pipeline. Please try it out, I'd love to hear what you think. I'd be happy to provide more details on any of this in the comments if you'd like :)

CSS subgrid is super good

Vtables Aren't Slow (Usually)

Gloamy: An open source Claude Cowork alternative

Aragorn's Tax Policy and Other Weird Shibboleths

Apple at 50: My journey to the Mac

Oil prices soar and shares drop after Trump threatens more Iran strikes

Show HN: Topical.so - structural SEO audits for AI-generated blogs

A life insurance fraud ring built on fake restaurants

The Self-Cancelling Subscription

Peaky Peek – Local-first debugger for AI agents

Andon (Manufacturing)

You can use AI every day and still not get better

Congressional scrutiny of Kalshi, Polymarket explodes

Artemis computer running two instances of MS outlook; they can't figure out why

As arms agreements fray, China expands its nuclear weapons infra

WebKit Features for Safari 26.4

Artemis II will use laser beams to live-stream 4K moon footage at 260 Mbps

In a thunderous launch, Artemis II astronauts leave Earth. Here's what's next

Delve allegedly forked an open-source tool and sold it as its own

There Is No Standard EM Role

Best Enterprise Claude Code Gateway

Node.js can host a new language. Interpreter is the easiest thing

Startup funding shatters all records in Q1

Japanese X is now America's favorite corner of the internet

Rare Apple Prototypes for iPod, iPhone, Watch [video]

The Beep at Meta

Stand-Alone Complex or Vibercrime? Exploring GenAI in Cybercrime Ecosystems

Goodbye, Apple Photos

Ask HN: What percentage of HN is simply promotional content?

BIGA-Bank-of-Infinity-Generating-Automata