frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I indexed 8,643 BSides talks across 227 chapters and 6 continents

https://allbsides.com/
2•Parkado•2h ago
Hi HN,

I'm Roland, and for the past few weeks, I've been building AllBSides — a directory of every BSides conference talk uploaded to YouTube. As of today, 8,643 talks from 5,927 speakers across 227 chapters in 68 countries. Combined runtime is 280 days. The transcripts come to about 60 million words.

The archive came together in stages:

1. Manually map every BSides chapter's YouTube channel 2. Pull every video and transcript from Supabase 3. Run each transcript through Haiku for tag extraction (tools, topics, difficulty, team, talk style, research method, and much more) 4. Run results through Sonnet for categorization and dedup 5. Final pass goes through Opus for verification 6. Do a manual verification - at one time, the pipeline showed over 16k AI suggestions for manual verification. Today, most are resolved.

Total LLM cost so far: about €200. The whole pipeline is rebuildable from scratch.

Each talk gets its own page with embedded video, full transcript, speakers, tags, and "related talks." Each tool/framework/protocol/standard mentioned across the corpus gets its own page (3,968 distinct technologies tracked).

Some interesting facts I gathered while building it:

-(A) The site is currently 94% bot traffic. Of that, about 80,000 hits/month are AI training crawlers (ClaudeBot, GPTBot, meta-externalagent). Within 7 days of the talks archive going live, all major AI labs had ingested the entire corpus. The discovery cascade was startling to watch in real time.

-(B) The taxonomy work was the hardest part. Distinguishing "tools" from "frameworks" from "protocols" from "concepts" sounds easy until you have 5,000 ambiguous extracted entities. The 3-tier LLM pipeline helped a lot — Haiku alone was too noisy, Opus alone was too expensive.

-(C) Top tools mentioned: Wireshark (343), PowerShell (342), Metasploit (332), Burp Suite (322), GitHub (296), VirusTotal (273), Docker (253), Splunk (251), Nmap (247), MITRE ATT&CK (237). The list reflects what BSides talks actually discuss, not what vendors curate.

-(D) May is the peak BSides month — 29 events, 17% of all events with dates.

-(E) The top 1% of talks (86 videos by view count) account for 51% of all viewership. The other 99% are deeply niche, often the only video record of a specific technique.

The stack is intentionally lean: Go, SQLite, vanilla JavaScript, BunnyCDN. Static rendering at build time. No frameworks, no client-side state. The site costs about €50/month to run.

The data behind this post and much more can be found in the site footer, under the link "stats".

Happy to answer questions about the data pipeline, the taxonomy decisions, or what the AI crawler patterns looked like as the archive went live. Feedback on what to build next is genuinely welcome — I'm a solo dev figuring this out as I go.

— Roland (parkado)

Show HN: I Built a Museum Exhibit

https://knhash.in/built-an-exhibit/
4•kn81198•2d ago•0 comments

Show HN: nfsdiag – A NFS diagnostic application

https://github.com/lsferreira42/nfsdiag
37•lsferreira42•2d ago•3 comments

Show HN: Yames – A distraction-free desktop metronome built with Rust and Tauri

https://turutupa.github.io/yames/
2•turutupa•33m ago•0 comments

Show HN: Node-Vmm – Linux MicroVMs in Pure Node.js for Mac/Windows/Linux in ~1s

https://github.com/misaelzapata/node-vmm
3•misaelzapata•1h ago•0 comments

Show HN: I indexed 8,643 BSides talks across 227 chapters and 6 continents

https://allbsides.com/
2•Parkado•2h ago•0 comments

Show HN: NeuralScript – A pure-Rust AOT compiler

https://github.com/bwiemz/NSL
2•AkaiNa•4h ago•0 comments

Show HN: Apple's SHARP running in the browser via ONNX runtime web

https://github.com/bring-shrubbery/ml-sharp-web
181•bring-shrubbery•1d ago•43 comments

Show HN: Ableton Live MCP

https://github.com/bschoepke/ableton-live-mcp
112•bschoepke•1d ago•76 comments

Show HN: Muesli – If Granola and Wisprflow had an open source on device baby

https://freedspeech.xyz
9•pHequals7•7h ago•6 comments

Show HN: Bonsai 1.7B ternary model at 442T/s on M4 Max

https://agents2agents.ai/bonsai
12•hhuytho•8h ago•3 comments

Show HN: State of the Art of Coding Models, According to Hacker News Commenters

https://hnup.date/hn-sota
155•yunusabd•2d ago•86 comments

Show HN: I built a RISC-V emulator that runs DOOM

https://github.com/lalitshankarch/rvcore
46•Flex247A•1d ago•2 comments

Show HN: Pollen – distributed WASM runtime, no control plane, single binary

https://github.com/sambigeara/pollen
132•sambigeara•4d ago•59 comments

Show HN: DAC – open-source dashboard as code tool for agents and humans

https://github.com/bruin-data/dac
113•karakanb•5d ago•35 comments

Show HN: Pytest plugin that classifies why your CI failed

https://github.com/ahmad212o/pytest-cloudreport
3•ahmad212o•10h ago•0 comments

Show HN: Replacing spec-driven development with just facts

https://github.com/av/facts
7•everlier•10h ago•2 comments

Show HN:Privacy-First Pdf Converter

https://privapdf.net
4•omertt27•10h ago•5 comments

Show HN: Software Engineer to Novelist: Writing a Book Like Coding

https://frequal.com/forwriters/
21•TeaVMFan•1d ago•5 comments

Show HN: Parrot – a fun, skeuomorphic audio recorder to hear yourself

https://www.zkhrv.com/parrot
17•zkhrv•1d ago•2 comments

Show HN: WhatCable, a tiny menu bar app for inspecting USB-C cables

https://github.com/darrylmorley/whatcable
555•sleepingNomad•3d ago•166 comments

Show HN: Mljar Studio – local AI data analyst that saves analysis as notebooks

https://mljar.com/
70•pplonski86•2d ago•16 comments

Show HN: AI CAD Harness

https://fusion.adam.new/install
98•zachdive•3d ago•95 comments

Show HN: Browser-based light pollution simulator using real photometric data

https://iesna.eu/?wasm=skyglow_demo
42•holg•2d ago•16 comments

Show HN: Piruetas – A self-hosted diary app I built for my girlfriend

https://piruet.app
70•patillacode•2d ago•49 comments

Show HN: Filling PDF forms with AI using client-side tool calling

https://copilot.simplepdf.com/?share=a7d00ad073c75a75d493228e6ff7b11eb3f2d945b6175913e87898ec96ca...
58•nip•2d ago•25 comments

Show HN: Large Scale Article Extract of Newspapers 1730s-1960s

https://snewpapers.com/
51•brettnbutter•2d ago•20 comments

Show HN: Site Mogging

https://sitemogging.com
68•jilles•3d ago•76 comments

Show HN: Stop playing my matchstick puzzles, start building your own in seconds

https://mathstick.github.io
36•trangram•2d ago•33 comments

Show HN: Loopsy, a way for terminals and AI agents on different machines to talk

https://github.com/leox255/loopsy
57•todience•3d ago•12 comments

Show HN: Kula – a family health platform that makes sense of your data

6•samuraikmc•19h ago•10 comments