frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: WhiskeySour – A 10x faster drop-in replacement for BeautifulSoup

7•ayas_behera•3h ago
The Problem

I’ve been using BeautifulSoup for sometime. It’s the standard for ease-of-use in Python scraping, but it almost always becomes the performance bottleneck when processing large-scale datasets.

Parsing complex or massive HTML trees in Python typically suffers from high memory allocation costs and the overhead of the Python object model during tree traversal. In my production scraping workloads, the parser was consuming more CPU cycles than the network I/O. Lxml is fast but again uses up a lot of memory when processing large documents and has can cause trouble with malformed HTML.

The Solution

I wanted to keep the API compatibility that makes BS4 great, but eliminates the overhead that slows down high-volume pipelines. It also uses html5ever which That’s why I built WhiskeySour. And yes… I *vibe coded the whole thing*.

WhiskeySour is a drop-in replacement. You should be able to swap from "bs4 import BeautifulSoup" with "from whiskeysour import WhiskeySour" and see immediate speedups. Your workflows that used to take more than 30 mins might take less than 5 mins now.

I have shared the detailed architecture of the library here: https://the-pro.github.io/whiskeySour/architecture/

Here is the benchmark report against bs4 with html.parser: https://the-pro.github.io/whiskeySour/bench-report/

Here is the link to the repo: https://github.com/the-pro/WhiskeySour

Why I’m sharing this

I’m looking for feedback from the community on two fronts:

1. Edge cases: If you have particularly messy or malformed HTML that BS4 handles well, I’d love to know if WhiskeySour encounters any regressions.

2. Benchmarks: If you are running high-volume parsers, I’d appreciate it if you could run a test on your own datasets and share the results.

Comments

skidiwub•2h ago
Yet more slop.

Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git)

https://github.com/nex-crm/wuphf
180•najmuzzaman•8h ago•90 comments

Show HN: I've built a nice home server OS

https://lightwhale.asklandd.dk/
161•Zta77•20h ago•59 comments

Show HN: SherifDB, a databe written in Golang under 500 LOC

https://emmanuel326.github.io/blogs/sheriffdb.html
3•Nya-kundi•2h ago•1 comments

Show HN: WhiskeySour – A 10x faster drop-in replacement for BeautifulSoup

7•ayas_behera•3h ago•1 comments

Show HN: Browser Harness – Gives LLM freedom to complete any browser task

https://github.com/browser-use/browser-harness
116•gregpr07•1d ago•56 comments

Show HN: Gova – The declarative GUI framework for Go

https://github.com/NV404/gova
134•aliezsid•1d ago•27 comments

Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite

https://github.com/russellromney/honker
300•russellthehippo•2d ago•78 comments

Show HN: Agent Vault – Open-source credential proxy and vault for agents

https://github.com/Infisical/agent-vault
148•dangtony98•3d ago•55 comments

Show HN: Tolaria – Open-source macOS app to manage Markdown knowledge bases

https://github.com/refactoringhq/tolaria
296•lucaronin•1d ago•133 comments

Show HN: VT Code – Rust TUI coding agent with multi-provider support

https://github.com/vinhnx/VTCode
13•vinhnx•14h ago•2 comments

Show HN: Nimbus – Browser with Claude Code UX

https://usenimbus.app/
15•pycassa•21h ago•1 comments

Show HN: HNswered – watches for replies to your Hacker News posts and comments

https://github.com/adam-s/HNswered
20•dataviz1000•22h ago•21 comments

Show HN: Agent MCP Studio – build multi-agent MCP systems in a browser tab

https://www.agentmcp.studio
9•stealthtsdb•10h ago•3 comments

Show HN: Built an AI that maps 17 facial dimensions and shows what to improve

https://realsmile.online
5•realsmile•13h ago•1 comments

Show HN: Collaborative sentence builder with real-time voting

https://www.sentensus.com/
11•cd-4•13h ago•0 comments

Show HN: leaf – a terminal Markdown previewer with a GUI-like experience

https://github.com/RivoLink/leaf
43•RivoLink•1d ago•22 comments

Show HN: Bunny Agent – Build Coding Agent SaaS via Native AI SDK UI

https://github.com/buda-ai/bunny-agent
8•chepy•14h ago•0 comments

Show HN: Broccoli, one shot coding agent on the cloud

https://github.com/besimple-oss/broccoli
81•yzhong94•3d ago•49 comments

Show HN: Built a daily game where you sort historical events chronologically

https://hisorty.app/
72•damiannn•2d ago•62 comments

Show HN: I Reverse Engineered Codex Background Computer Use

https://github.com/actuallyepic/background-computer-use
9•anupamb•20h ago•0 comments

Show HN: GoModel – an open-source AI gateway in Go

https://github.com/ENTERPILOT/GOModel/
213•santiago-pl•4d ago•75 comments

Show HN: #1 On This Day

https://onthisday-theta.vercel.app
17•starzmustdie•1d ago•1 comments

Show HN: Markdown as a Database

https://github.com/molefrog/lilmd
10•molefrog•21h ago•1 comments

Show HN: Claude Code Manager

https://claude.ldlework.com/
10•ldlework•21h ago•1 comments

Show HN: I built a simple site to reduce tool overload and improve focus

8•saeefwaleed•21h ago•1 comments

Show HN: Lilo – a self-hosted, open-source intelligent personal OS

https://github.com/abi/lilo
7•abi•21h ago•4 comments

Show HN: Markant – A Dedicated Markdown Reader

https://markant.md/
6•lokimedes•23h ago•2 comments

Show HN: RoboAPI – A unified REST API for robots, like Stripe but for hardware

https://github.com/amitb-quantum/roboapi
10•xmas123•18h ago•2 comments

Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces

https://www.npmjs.com/package/pando-proxy
9•george_ciobanu•20h ago•2 comments

Show HN: I built a CLI that turns your codebase into clean LLM input

https://github.com/NoahCristino/llmcat
10•cristinon•20h ago•0 comments