frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
624•klaussilveira•12h ago•182 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
926•xnx•18h ago•548 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
32•helloplanets•4d ago•24 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
109•matheusalmeida•1d ago•27 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
9•kaonwarb•3d ago•7 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
40•videotopia•4d ago•1 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
219•isitcontent•13h ago•25 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
210•dmpetrov•13h ago•103 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
322•vecti•15h ago•143 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
369•ostacke•18h ago•94 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
358•aktau•19h ago•181 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
477•todsacerdoti•20h ago•232 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
272•eljojo•15h ago•160 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
402•lstoll•19h ago•271 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
85•quibono•4d ago•20 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
14•jesperordrup•2h ago•6 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
25•romes•4d ago•3 comments

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
3•theblazehen•2d ago•0 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
56•kmm•5d ago•3 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
12•bikenaga•3d ago•2 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
243•i5heu•15h ago•188 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
52•gfortaine•10h ago•21 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
140•vmatsiiako•17h ago•62 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
280•surprisetalk•3d ago•37 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1058•cdrnsf•22h ago•433 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
132•SerCe•8h ago•117 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
70•phreda4•12h ago•14 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
28•gmays•7h ago•10 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
176•limoce•3d ago•96 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
63•rescrv•20h ago•22 comments
Open in hackernews

Turn any website into an API

https://www.parse.bot
105•pcl•6mo ago

Comments

runningmike•6mo ago
Nice idea. In practice many sites have different methods to prevent scraping. Large risk on doing things manually imho.
renegat0x0•6mo ago
Huh, I I have been working on solution to that problem.

My project allows to define rules for various sites, so eventually everything is scraped correctly. For YouTube yet dlp is also used to augment results.

I can crawl using requests, selenium, Httpx and others. Response is via json so it easy to process.

The downside is that it may not be the fastest solution, and I have not tested it against proxies.

https://github.com/rumca-js/crawler-buddy

with•6mo ago
pretty cool idea. using stagehand under the hood?
vin047•6mo ago
No information on pricing on the site.
thrdbndndn•6mo ago
I scrape website content regularly (usually as one-offs) and have a hand-crafted extractor template where I just fill in a few arguments (mainly CSS selectors and some options) to get it working quickly. These days, I do sometimes ask AI to do this for me by giving it the HTML.

The issue is that for any serious use of this concept, some manual adjustment is almost always needed. This service says, "Refine your scraper at any time by chatting with the AI agent," but from what I can tell, you can't actually see the code it generates.

Relying solely on the results and asking the AI to tweak them can work, but often the output is too tailored to a specific page and fails to generalize (essentially "overfitting.") And surprisingly, this back-and-forth can be more tedious and time-consuming than just editing a few lines of code yourself. Also if you can't directly edit the code behind the scenes, there are situations where you'll never be able to get the exact result you want, no matter how much you try to explain it to the AI in natural language.

throwup238•6mo ago
I’ve had no shortage of trouble using LLMs for scrapers because for some reason they almost always ignore my instructions to use something other than the class name for selectors. They love to use the hashed class (like emotion/styled/whatever css-in-js library de jour) names that change way too often.
websiteapi•6mo ago
I'm surprised (and could be wrong), no one has made a chrome extension that just controls a page and exposes the output to localhost for consumption as an API. Similar to using chrome web driver, but without the setup.
ExxKA•6mo ago
Isnt that basically what browser-use is?
kevindamm•6mo ago
I kind of agree and don't. You could say HTTP+DOM is the API, we're already there. But it lacks the structure and a more explicit regularity (in part because it's meant for human consumption, not programming). And if you were to describe the whole protocol (including CSS and JS as they can change ordering, even content, of what's shown) it's incredibly more complicated than the equivalent, distilled representation.

There are efforts going back at least fifteen years to extract ontologies from natural language [0] and HTML structure [1].

[0]: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d... (2010) [PDF]

[1]: https://doi.org/10.1016/j.dss.2009.02.011 (2009)

meatjuice•6mo ago
It's not a browser extension, but controlling the actual browser without using webdriver is already a thing.

https://github.com/autoscrape-labs/pydoll

_1tem•6mo ago
Way too little information on the homepage. Does this handle pagination? What about sites behind authentication? I assume the generated API is stable, i.e. the shape of the JSON will not change after a scraper is built, but what if the site changes it's DOM, does the scraper need to be regenerated? Does this attempt to defeat anti-bot and anti-scraper walls like Cloudflare?
ExxKA•6mo ago
No no, its good that is simple to understand.

All those details can go in the docs / faqs section.

slightwinder•6mo ago
Where are those docs?
ExxKA•6mo ago
I really like the simplicity of the offering. The website looks great (to a human) and explains the API idea very simply. Good stuff!
verelo•6mo ago
Mobile ux is completely broken. This would be a 5 min fix with Claude and cursor. Signals to Me that i can expect the backend to struggle with anything basic like a captcha etc.
maticzav•6mo ago
i love the idea!

i know that https://expand.ai/ is doing something similar, maybe worth checking out

Joeboy•6mo ago
This is relevant to my interests[0]

Based on the website I was quite skeptical. It looks too much like an "indiehacker", minimum-almost-viable-product, fake-it-till-you-make-it, trolling-for-email-addresses kind of website.

But after a quick search on twitter, it seems like people are actually using it and reporting good results. Maybe I'll take a proper look at it at some point.

I'd still like to know more about pricing, how it deals with cloudflare challenges, non-semantic markup and other awkwardnesses.

[0] https://github.com/Joeboy/cinescrapers

artluko•6mo ago
I saw your video on youtube really impressive
Aaargh20318•6mo ago
It’s a cute idea, but ultimately not very useful. An API is more than just an endpoint that gives easy to parse results. The most important part is that an API is a contract. An API implies that things won’t suddenly break without prior announcement. Any form of web-scraping, no matter how cleverly done, is inherently fragile. They can change their front-end for any reason which could break your scraper. As such you cannot rely on such an interface.
autonomousErwin•6mo ago
I wonder if not just checking the site every day (or minute ) would solve for this.

It's not necessarily the structure of the source data (the DOM, the HTML etc.) but rather the translator that needs to be contractually consistent. The translator in this case is the service for the endpoints.

Aaargh20318•6mo ago
> I wonder if not just checking the site every day (or minute ) would solve for this.

No, because a webpage makes no promise to not change. Even if you check every minute, can your system handle random 1 minute periods of unpredictable behavior? What if they remove data? What if the meaning of the data changes (e.g. instead of a maximum value for some field they now show the average value) how would your system deal with that? What if they are running an A/B test and 10% of your ‘API’ requests return a different page?

This is not a technical problem and the solution is not a technical one. You need to have some kind of relationship with the entity whose data you are consuming or be okay with the fact that everything can just stop working at any random moment in time.

10000truths•6mo ago
That's just part and parcel of relying on third parties - you should always price in the maintenance burden of keeping up with potential changes on their end. That burden is a lot lower if the third party cooperates with you and provides an explicit contract and backwards compatibility, but it's still not zero.
Aaargh20318•6mo ago
It’s not about the maintenance cost, it’s about continuity of service. If you scrape a website things may break at any time. If you use a proper API and have a contract with the supplier you will have the opportunity to make any changes before things break.
hoppp•5mo ago
You download the html, hash it with sha512 , then run the Ai and the webscraping and cache the api content

When the cache is invalidated you refetch the html, check the sha512 hash to see if anything changed then proceed based on yes or no

Or something like that. Its not fast but hashing and comparing is fast compared to inference anyways

Aaargh20318•5mo ago
I’m not sure what that would solve? Your API call is still broken. Best case you’re serving stale data.
Jotalea•6mo ago
It says that the backend is down, I guess I'll have to wait. Hope I don't forget about it before.
p3rls•6mo ago
It's great being an independent site in 2025.

You get fucked by google promoting AIOs and hindustantimes articles for everything in your niche then these scrapers knocking your server offline on the other.