frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: WhiskeySour – A 10x faster drop-in replacement for BeautifulSoup

6•ayas_behera•1h ago
The Problem

I’ve been using BeautifulSoup for sometime. It’s the standard for ease-of-use in Python scraping, but it almost always becomes the performance bottleneck when processing large-scale datasets.

Parsing complex or massive HTML trees in Python typically suffers from high memory allocation costs and the overhead of the Python object model during tree traversal. In my production scraping workloads, the parser was consuming more CPU cycles than the network I/O. Lxml is fast but again uses up a lot of memory when processing large documents and has can cause trouble with malformed HTML.

The Solution

I wanted to keep the API compatibility that makes BS4 great, but eliminates the overhead that slows down high-volume pipelines. It also uses html5ever which That’s why I built WhiskeySour. And yes… I *vibe coded the whole thing*.

WhiskeySour is a drop-in replacement. You should be able to swap from "bs4 import BeautifulSoup" with "from whiskeysour import WhiskeySour" and see immediate speedups. Your workflows that used to take more than 30 mins might take less than 5 mins now.

I have shared the detailed architecture of the library here: https://the-pro.github.io/whiskeySour/architecture/

Here is the benchmark report against bs4 with html.parser: https://the-pro.github.io/whiskeySour/bench-report/

Here is the link to the repo: https://github.com/the-pro/WhiskeySour

Why I’m sharing this

I’m looking for feedback from the community on two fronts:

1. Edge cases: If you have particularly messy or malformed HTML that BS4 handles well, I’d love to know if WhiskeySour encounters any regressions.

2. Benchmarks: If you are running high-volume parsers, I’d appreciate it if you could run a test on your own datasets and share the results.

Comments

skidiwub•1h ago
Yet more slop.

SpaceX: Test Like You Fly [video]

https://www.spacex.com/content/starship/test-like-you-fly
1•w8vY7ER•1m ago•1 comments

Buddhist monk builds irreverent classifieds for lonely human mortals

https://chickenlist.com
1•ascottaggart•2m ago•0 comments

Show HN: Mux0 – Open-source macOS terminal with workspace tabs and agent hooks

https://mux0.com/
1•Justin3go•3m ago•0 comments

Andromeda – Making local AI accessible to non-technical users

https://store.steampowered.com/app/4056090/SmarterWaysProductions_Andromeda/
1•klueglscheisser•3m ago•1 comments

Niri 26.04 was just released (scrollable-tiling Wayland compositor)

https://github.com/niri-wm/niri/releases/tag/v26.04
1•nickjj•5m ago•0 comments

The physics slop that YouTube wants me to make [video]

https://www.youtube.com/watch?v=Cd5EHfRerGI
1•thorum•5m ago•0 comments

NATO eyes Saab GlobalEye to replace AWACS planes in historic shift from the U.S.

https://www.armyrecognition.com/news/aerospace-news/2026/nato-selects-swedish-saab-globaleye-to-r...
2•vrganj•6m ago•0 comments

The Beautiful Barbell Effect

https://camerasearch.substack.com/p/the-beautiful-barbell-effect
1•Aeroi•7m ago•0 comments

Show HN: I gave Claude and Cursor a seat on my Kanban board [video]

https://www.youtube.com/watch?v=CD2-NGtshrY
1•spotlayn•7m ago•0 comments

Graphite open source hybrid image editor

https://www.graphite.art/
2•tomcam•8m ago•0 comments

Writing a book is a labor of love

https://usefulfictions.substack.com/p/writing-a-book-is-a-labor-of-love
2•eatitraw•9m ago•0 comments

Why Silicon Valley Is Turning to the Catholic Church

https://www.theatlantic.com/ideas/2026/04/silicon-valley-catholicism-ai-leo/686948/
2•jonah•10m ago•0 comments

Show HN: Odozi – open-source iOS journaling app

https://odozi.app
2•jlarks32•11m ago•0 comments

What's Missing in the 'Agentic' Story

https://www.mnot.net/blog/2026/04/24/agents_as_collective_bargains
4•ingve•13m ago•0 comments

Baldmaxxing Confidence Mobile App – No BS

https://baldandwinning.com/en
2•thisissidhant•15m ago•1 comments

GLP-1 receptor agonist effects on Alzheimer's pathophysiology: Systematic review

https://www.sciencedirect.com/science/article/pii/S1044743126000217
2•bookofjoe•16m ago•0 comments

A validation-gated execution system (VYRDON)

https://github.com/teee79A/vyrdon
2•vyrdon•18m ago•0 comments

Your Job Isn't Programming

https://codeandcake.dev/posts/2025-12-12-your-job-isnt-programming
2•dgroshev•20m ago•0 comments

Trump alum helps Israel mount AI influence campaign

https://www.axios.com/2026/04/25/israel-ai-influence-parscale
2•sosomoxie•21m ago•0 comments

Mo RAM, Mo Problems

https://fabiensanglard.net/curse/
3•blfr•22m ago•0 comments

From Gaza With Love (1h49 documentary)

https://www.youtube.com/watch?v=xYj-XMIjHGo
2•bigbugbag•23m ago•1 comments

Fast Attention for Short Sequences

https://blog.qwertyforce.dev/posts/fast_attention_for_short_sequences
2•qwertyforce•27m ago•0 comments

The Glass_V1 standard for new computational storage

https://github.com/argoscollective/Glass-V1-Standard
2•Argoscollective•28m ago•0 comments

Over the Past Decade, Congestive Heart Failure Increased by over 10%

https://www.scai.org/media-center/news-and-articles/over-past-decade-congestive-heart-failure-and...
2•geox•28m ago•0 comments

Vibe Designing

https://jonathannen.com/vibe-designing/
2•speckx•30m ago•0 comments

Some data on the shape of the forgetting curve

https://www.natemeyvis.com/some-data-on-the-shape-of-the-forgetting-curve/
2•ingve•31m ago•0 comments

Show HN: Outworx Docs – Hosted API docs with an MCP server per project

https://docs.outworx.io
2•aemadeldin•33m ago•0 comments

Software Piracy Statistics – 2026 Outlook

https://www.revenera.com/blog/software-monetization/software-piracy-stat-watch/
2•keepamovin•34m ago•0 comments

From car and phone to tractors, populist wave to end 'captive' repair economy

https://www.cnbc.com/2026/04/25/right-to-repair-consumer-prices-affordability-economy-elections.html
3•1vuio0pswjnm7•34m ago•0 comments

The Fermi Paradox Is Nerdslop

https://monismos.substack.com/p/the-fermi-paradox-is-nerdslop
2•eatitraw•39m ago•0 comments