frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I scraped 1.94M Airbnb photos for opium dens, pet cameos, and messy kitchens

https://burla-cloud.github.io/examples/airbnb-burla-demo/
62•jmp1062•4h ago

Comments

gavmor•1h ago
These are amazing! Some are probably offensive, because I saw a cozy, if kitschy, British den labeled as "did-someone-just-leave" vibes which... unfair.
jperryjperry•51m ago
do you know the listing number? will remove that one haha
wheelerwj•1h ago
This thing is ripe for a lawsuit and has terrible methodology as far as I can tell.
smrtinsert•51m ago
On what grounds is there a lawsuit? Hasn't scraping been classified as legal?
happyopossum•23m ago
Calling someone’s apartment an opium den is potentially libel, and if it results in a material financial impact, you’ve got a lawsuit.
wheelerwj•21m ago
classifying people's businesses as an "opium den" using a shitty LLM prompt seems like a pretty good way to piss some people off.
guywithahat•1h ago
This is pretty great, the reviews at the bottom are the best part. I'm impressed they were able to scrape so much data
danhon•1h ago
"Looking at every public Airbnb listing in Inside Airbnb's open data dump, all at once, on Burla"

This Inside Airbnb?

Community Guidelines

Please:

Only take the data you need. Do not scrape data from the site, if you would like to subscribe to the data directly, please email data@insideairbnb.com

yodon•1h ago
>Everything was parallelized on Burla, on a single dynamic cluster that scaled to ~1.7K CPU workers for photo download and CLIP, with 20 A100 GPUs running embedding clusters in parallel on the same cluster.

That's a lot of budget - would have been nice if they'd made an actual donation to the project, instead of pounding the project's servers and bandwidth when there are much better ways to interact with the data.

jperryjperry•52m ago
Totally fair callout. I should’ve been more careful here and leaned on the provided datasets / bulk access instead of pulling things at scale. That’s on me.

I’ll make a donation to support the project regardless. Appreciate you raising it.

danhon•37m ago
... so you'd only end up making a donation if you ended up "stressing the project's infra more than expected"?!
xikrib•1h ago
Ah yes, let's price the world out of the real estate market and then use insanely powerful AI models to systematically mock the living conditions of the poors.
NoLinkToMe•1h ago
What a waste of energy (money/resources)... Scraping and AI-scanning 2 million photos to identify animals in the advertisement pictures? What's the point.

As an exercise a sample of 1000 photos would've been enough. As a database, knowing a listing has a cat in the picture or a funny review doesn't offer any real value.

I wonder what the footprint is of such an exercise.

jperryjperry•54m ago
The pet detection part isn’t the point, that’s just a visible output. The actual goal was to stress test agents + distributed compute on something non-trivial.
ericmcer•50m ago
I dunno there are literally 100s of millions (billions?) of people who spend more than an hour per day just scrolling through social media feeds.

How much does it cost to send a billion people an hour of video every day? Almost all of the resources tech uses is for pointless or even negative things.

What % of compute/bandwidth do you think is used for "real value"? I would guess it is well below 1%.

xrd•1h ago
Airbnb was actually started by two guys who created an opium den for Obama's convention so this doesn't surprise me.
htrp•1h ago
This seems like an advertisement for an open source package

>Scale Python across 1,000 CPUs or GPUs in 1 second. Burla is a high-performance parallel processing library with an extremely fast developer experience. Scale batch processing, vector embeddings, inference, or build pipelines with dynamic hardware.

Edit: Author comment was flagged dead. They work at burla which is a managed cloud service for parallelizing python

andai•1h ago
Looks like it was hit by some sort of automated ChatGPT detector.
nickjantz•1h ago
Am I missing something other commenters are seeing about this not being an ad? The domain is on Burla, which hosted the compute needed for this. There's a giant airbnb x burla logo at the top. People are saying there's a lawsuit pending, it's against guidelines, what's the point, etc..

It's content marketing plain and simple for Burla towards people that view this site. It was highly likely done by employees at both Burla and AirBNB together as a joint project.

jperryjperry•55m ago
One of the Burla founders here. Not a joint project with Airbnb. I’ve been experimenting with giving agents access to Burla clusters and letting them run with analysis ideas I find interesting. This was one of the results.

The branding is a bit much, fair call, but the intent here was just to explore what these agents can actually build when you give them access to large amounts of compute.

add-sub-mul-div•52m ago
How many accounts do you have spamming your projects here?
zamadatix•40m ago
Looks like just 2 accounts with 11 total submissions in the last year, both with disclosures in the comments and/or profile https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que....

This post is a bit lighter on that disclosure than I'd like (and isn't as obvious as a Show HN would be) but I feel I missing some big portion of the backstory to this comment?

add-sub-mul-div•54m ago
This vanity scraping is fucking up the internet for everyone else.

It's hardly the only thing, but it's part of the problem.

jperryjperry•47m ago
Fair feedback. Definitely more backlash than I expected. The intent was to experiment with large-scale analysis, not add noise or put strain on shared resources. I’ll be more thoughtful about this kind of thing going forward.
devmor•48m ago
The author makes some pretty insane leaps in logic for classification, and it’s apparent in the photos.

“Drug-Den vibes” apparently means the owner is poor or a photo is obscured or badly lit.

wheelerwj•21m ago
yeah, theres a lot of trash assumptions going on here.
dwroberts•40m ago
“Drug den vibes” and they’re mostly just small rooms?
jperryjperry•15m ago
some are more psychedelic drug vibes and others are just insanely messy.

I've had shitty and small apartments many times and that doesn't prevent me from cleaning it. especially if I'm going to rent it out

nickthegreek•10m ago
Apparently if your resting place lacks a headboard, you abuse chemicals.
guywithahat•6m ago
I feel like floor mattresses, trash, and peeling paint were also at play. They're all sort of unsafe rooms people wouldn't want to go to unless they felt like they had to (i.e. doing drugs)
GrinningFool•33m ago
I'm struggling a bit with how the 'funniest' ranked reviews are genuine descriptions of people's miserable (and sometimes unsafe) experiences. Where's the funny?

As an experitisement, I guess it gets the name out there but not in any way I'd want for my business.

jperryjperry•18m ago
personally I find those experiences really funny especially in my life. looking back I think most people find humor in it, i could be wrong? I don't think so though

How Mark Klein told the EFF about Room 641A [book excerpt]

https://thereader.mitpress.mit.edu/the-whistleblower-who-uncovered-the-nsas-big-brother-machine/
216•the-mitr•2h ago•42 comments

Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library

https://semgrep.dev/blog/2026/malicious-dependency-in-pytorch-lightning-used-for-ai-training/
179•j12y•3h ago•47 comments

I built a Game Boy emulator in F#

https://nickkossolapov.github.io/fame-boy/building-a-game-boy-emulator-in-fsharp/
81•elvis70•1h ago•19 comments

CopyFail Was Not Disclosed to Distros

https://www.openwall.com/lists/oss-security/2026/04/30/10
131•ori_b•2h ago•72 comments

Belgium stops decommissioning nuclear power plants

https://dpa-international.com/general-news/urn:newsml:dpa.com:20090101:260430-930-14717/
615•mpweiher•6h ago•527 comments

Claude Code refuses requests or charges extra if your commits mention "OpenClaw"

https://twitter.com/theo/status/2049645973350363168
538•elmean•4h ago•345 comments

How an Oil Refinery Works

https://www.construction-physics.com/p/how-an-oil-refinery-works
211•chmaynard•5h ago•48 comments

Durable queues, streams, pub/sub, and a cron scheduler – inside your SQLite file

https://honker.dev/
98•ferriswil•4h ago•19 comments

You can beat the binary search

https://lemire.me/blog/2026/04/27/you-can-beat-the-binary-search/
155•vok•3d ago•79 comments

If Apple makes an iPad Neo, it's all over

https://www.techadvisor.com/article/3128472/if-apple-makes-an-ipad-neo-its-all-over.html
12•ndr42•32m ago•3 comments

I aggregated 28 US Government auction sites into one search

https://bidprowl.com
186•scarsam•6h ago•55 comments

Spain's parliament will act against massive IP blockages by LaLiga

https://www.democrata.es/en/politics/congress-and-senate/congress-will-act-against-massive-ip-blo...
270•akyuu•3h ago•110 comments

10Gb/s Ethernet: what I did to get it working in my home

https://www.gilesthomas.com/2026/04/10g-ethernet-what-i-did
54•gpjt•1d ago•27 comments

Mozilla's opposition to Chrome's Prompt API

https://github.com/mozilla/standards-positions/issues/1213
458•jaffathecake•11h ago•189 comments

A 1960s art school experiment that redefined creativity

https://thereader.mitpress.mit.edu/the-1960s-art-school-experiment-that-redefined-creativity/
44•pseudolus•3h ago•8 comments

Recovering files from beyond the grave using PhotoRec

https://lost-number.bearblog.dev/recovering-files-from-beyond-the-grave-using-photorec/
12•speckx•1h ago•1 comments

Granite 4.1: IBM's 8B Model Matching 32B MoE

https://firethering.com/granite-4-1-ibm-open-source-model-family/
241•steveharing1•8h ago•151 comments

The Zig project's rationale for their anti-AI contribution policy

https://simonwillison.net/2026/Apr/30/zig-anti-ai/
595•lumpa•16h ago•371 comments

Noctua releases official 3D CAD models for its cooling fans

https://www.noctua.at/en/3d-cad-models
454•embedding-shape•2d ago•99 comments

Where the goblins came from

https://openai.com/index/where-the-goblins-came-from/
980•ilreb•15h ago•585 comments

How Semiconductors Were Made in America

https://www.siliconimist.com/p/semiconductors-made-in-america
10•johncole•2d ago•2 comments

The Science Behind Honey's Eternal Shelf Life (2013)

https://www.smithsonianmag.com/science-nature/the-science-behind-honeys-eternal-shelf-life-1218690/
47•downbad_•5h ago•28 comments

A Primer on Bézier Curves – So What Makes a Bézier Curve?

https://pomax.github.io/bezierinfo/
101•mostlyk•2d ago•21 comments

Kubereboot/Kured: Kubernetes Reboot Daemon

https://github.com/kubereboot/kured
10•ankitg12•2h ago•1 comments

Show HN: TRiP – a complete transformer engine in C built from scratch just by me

https://github.com/carlovalenti/TRiP
16•carlovalenti•2h ago•1 comments

Full-Text Search with DuckDB

https://peterdohertys.website/blog-posts/full-text-search-w-duckdb.html
4•ethagnawl•55m ago•0 comments

My Stratum-0 Atomic Clock

https://coverclock.blogspot.com/2017/05/my-stratum-0-atomic-clock_9.html
62•g0xA52A2A•3d ago•21 comments

What can we gain by losing infinity?

https://www.quantamagazine.org/what-can-we-gain-by-losing-infinity-20260429/
82•Tomte•1d ago•86 comments

Craig Venter has died

https://www.jcvi.org/media-center/j-craig-venter-genomics-pioneer-and-founder-jcvi-and-diploid-ge...
319•rdl•17h ago•76 comments

Largest Digital Human Rights Conference Suddenly Canceled

https://www.404media.co/rightscon-human-rights-conference-suddenly-postponed/
53•Brajeshwar•2h ago•9 comments