frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Computer Science from the Bottom Up

https://www.bottomupcs.com/
1•gurjeet•45s ago•0 comments

Show HN: I built a toy compiler as a young dev

https://vire-lang.web.app
1•xeouz•2m ago•0 comments

You don't need Mac mini to run OpenClaw

https://runclaw.sh
1•rutagandasalim•3m ago•0 comments

Learning to Reason in 13 Parameters

https://arxiv.org/abs/2602.04118
1•nicholascarolan•5m ago•0 comments

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

https://arxiv.org/abs/2601.22389
1•energyscholar•5m ago•1 comments

Ask HN: Will GPU and RAM prices ever go down?

1•alentred•5m ago•0 comments

From hunger to luxury: The story behind the most expensive rice (2025)

https://www.cnn.com/travel/japan-expensive-rice-kinmemai-premium-intl-hnk-dst
1•mooreds•6m ago•0 comments

Substack makes money from hosting Nazi newsletters

https://www.theguardian.com/media/2026/feb/07/revealed-how-substack-makes-money-from-hosting-nazi...
5•mindracer•7m ago•1 comments

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

https://www.wsj.com/finance/currencies/a-new-crypto-winter-is-here-and-even-the-biggest-bulls-are...
1•thm•7m ago•0 comments

Moltbook was peak AI theater

https://www.technologyreview.com/2026/02/06/1132448/moltbook-was-peak-ai-theater/
1•Brajeshwar•8m ago•0 comments

Why Claude Cowork is a math problem Indian IT can't solve

https://restofworld.org/2026/indian-it-ai-stock-crash-claude-cowork/
1•Brajeshwar•8m ago•0 comments

Show HN: Built an space travel calculator with vanilla JavaScript v2

https://www.cosmicodometer.space/
2•captainnemo729•8m ago•0 comments

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

https://www.wsj.com/tech/corning-fiber-optics-ai-e045ba3b
1•Brajeshwar•8m ago•0 comments

Micro-Front Ends in 2026: Architecture Win or Enterprise Tax?

https://iocombats.com/blogs/micro-frontends-in-2026
1•ghazikhan205•11m ago•0 comments

These White-Collar Workers Actually Made the Switch to a Trade

https://www.wsj.com/lifestyle/careers/white-collar-mid-career-trades-caca4b5f
1•impish9208•11m ago•1 comments

The Wonder Drug That's Plaguing Sports

https://www.nytimes.com/2026/02/02/us/ostarine-olympics-doping.html
1•mooreds•11m ago•0 comments

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

https://new.knife.day/blog/reddit-steel-sentiment-analysis
1•p-s-v•12m ago•0 comments

Federated Credential Management (FedCM)

https://ciamweekly.substack.com/p/federated-credential-management-fedcm
1•mooreds•12m ago•0 comments

Token-to-Credit Conversion: Avoiding Floating-Point Errors in AI Billing Systems

https://app.writtte.com/read/kZ8Kj6R
1•lasgawe•12m ago•1 comments

The Story of Heroku (2022)

https://leerob.com/heroku
1•tosh•12m ago•0 comments

Obey the Testing Goat

https://www.obeythetestinggoat.com/
1•mkl95•13m ago•0 comments

Claude Opus 4.6 extends LLM pareto frontier

https://michaelshi.me/pareto/
1•mikeshi42•14m ago•0 comments

Brute Force Colors (2022)

https://arnaud-carre.github.io/2022-12-30-amiga-ham/
1•erickhill•17m ago•0 comments

Google Translate apparently vulnerable to prompt injection

https://www.lesswrong.com/posts/tAh2keDNEEHMXvLvz/prompt-injection-in-google-translate-reveals-ba...
1•julkali•17m ago•0 comments

(Bsky thread) "This turns the maintainer into an unwitting vibe coder"

https://bsky.app/profile/fullmoon.id/post/3meadfaulhk2s
1•todsacerdoti•18m ago•0 comments

Software development is undergoing a Renaissance in front of our eyes

https://twitter.com/gdb/status/2019566641491963946
1•tosh•18m ago•0 comments

Can you beat ensloppification? I made a quiz for Wikipedia's Signs of AI Writing

https://tryward.app/aiquiz
1•bennydog224•19m ago•1 comments

Spec-Driven Design with Kiro: Lessons from Seddle

https://medium.com/@dustin_44710/spec-driven-design-with-kiro-lessons-from-seddle-9320ef18a61f
1•nslog•19m ago•0 comments

Agents need good developer experience too

https://modal.com/blog/agents-devex
1•birdculture•21m ago•0 comments

The Dark Factory

https://twitter.com/i/status/2020161285376082326
1•Ozzie_osman•21m ago•0 comments
Open in hackernews

Turn any website into an API

https://www.parse.bot
105•pcl•6mo ago

Comments

runningmike•6mo ago
Nice idea. In practice many sites have different methods to prevent scraping. Large risk on doing things manually imho.
renegat0x0•6mo ago
Huh, I I have been working on solution to that problem.

My project allows to define rules for various sites, so eventually everything is scraped correctly. For YouTube yet dlp is also used to augment results.

I can crawl using requests, selenium, Httpx and others. Response is via json so it easy to process.

The downside is that it may not be the fastest solution, and I have not tested it against proxies.

https://github.com/rumca-js/crawler-buddy

with•6mo ago
pretty cool idea. using stagehand under the hood?
vin047•6mo ago
No information on pricing on the site.
thrdbndndn•6mo ago
I scrape website content regularly (usually as one-offs) and have a hand-crafted extractor template where I just fill in a few arguments (mainly CSS selectors and some options) to get it working quickly. These days, I do sometimes ask AI to do this for me by giving it the HTML.

The issue is that for any serious use of this concept, some manual adjustment is almost always needed. This service says, "Refine your scraper at any time by chatting with the AI agent," but from what I can tell, you can't actually see the code it generates.

Relying solely on the results and asking the AI to tweak them can work, but often the output is too tailored to a specific page and fails to generalize (essentially "overfitting.") And surprisingly, this back-and-forth can be more tedious and time-consuming than just editing a few lines of code yourself. Also if you can't directly edit the code behind the scenes, there are situations where you'll never be able to get the exact result you want, no matter how much you try to explain it to the AI in natural language.

throwup238•6mo ago
I’ve had no shortage of trouble using LLMs for scrapers because for some reason they almost always ignore my instructions to use something other than the class name for selectors. They love to use the hashed class (like emotion/styled/whatever css-in-js library de jour) names that change way too often.
websiteapi•6mo ago
I'm surprised (and could be wrong), no one has made a chrome extension that just controls a page and exposes the output to localhost for consumption as an API. Similar to using chrome web driver, but without the setup.
ExxKA•6mo ago
Isnt that basically what browser-use is?
kevindamm•6mo ago
I kind of agree and don't. You could say HTTP+DOM is the API, we're already there. But it lacks the structure and a more explicit regularity (in part because it's meant for human consumption, not programming). And if you were to describe the whole protocol (including CSS and JS as they can change ordering, even content, of what's shown) it's incredibly more complicated than the equivalent, distilled representation.

There are efforts going back at least fifteen years to extract ontologies from natural language [0] and HTML structure [1].

[0]: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d... (2010) [PDF]

[1]: https://doi.org/10.1016/j.dss.2009.02.011 (2009)

meatjuice•6mo ago
It's not a browser extension, but controlling the actual browser without using webdriver is already a thing.

https://github.com/autoscrape-labs/pydoll

_1tem•6mo ago
Way too little information on the homepage. Does this handle pagination? What about sites behind authentication? I assume the generated API is stable, i.e. the shape of the JSON will not change after a scraper is built, but what if the site changes it's DOM, does the scraper need to be regenerated? Does this attempt to defeat anti-bot and anti-scraper walls like Cloudflare?
ExxKA•6mo ago
No no, its good that is simple to understand.

All those details can go in the docs / faqs section.

slightwinder•6mo ago
Where are those docs?
ExxKA•6mo ago
I really like the simplicity of the offering. The website looks great (to a human) and explains the API idea very simply. Good stuff!
verelo•6mo ago
Mobile ux is completely broken. This would be a 5 min fix with Claude and cursor. Signals to Me that i can expect the backend to struggle with anything basic like a captcha etc.
maticzav•6mo ago
i love the idea!

i know that https://expand.ai/ is doing something similar, maybe worth checking out

Joeboy•6mo ago
This is relevant to my interests[0]

Based on the website I was quite skeptical. It looks too much like an "indiehacker", minimum-almost-viable-product, fake-it-till-you-make-it, trolling-for-email-addresses kind of website.

But after a quick search on twitter, it seems like people are actually using it and reporting good results. Maybe I'll take a proper look at it at some point.

I'd still like to know more about pricing, how it deals with cloudflare challenges, non-semantic markup and other awkwardnesses.

[0] https://github.com/Joeboy/cinescrapers

artluko•6mo ago
I saw your video on youtube really impressive
Aaargh20318•6mo ago
It’s a cute idea, but ultimately not very useful. An API is more than just an endpoint that gives easy to parse results. The most important part is that an API is a contract. An API implies that things won’t suddenly break without prior announcement. Any form of web-scraping, no matter how cleverly done, is inherently fragile. They can change their front-end for any reason which could break your scraper. As such you cannot rely on such an interface.
autonomousErwin•6mo ago
I wonder if not just checking the site every day (or minute ) would solve for this.

It's not necessarily the structure of the source data (the DOM, the HTML etc.) but rather the translator that needs to be contractually consistent. The translator in this case is the service for the endpoints.

Aaargh20318•6mo ago
> I wonder if not just checking the site every day (or minute ) would solve for this.

No, because a webpage makes no promise to not change. Even if you check every minute, can your system handle random 1 minute periods of unpredictable behavior? What if they remove data? What if the meaning of the data changes (e.g. instead of a maximum value for some field they now show the average value) how would your system deal with that? What if they are running an A/B test and 10% of your ‘API’ requests return a different page?

This is not a technical problem and the solution is not a technical one. You need to have some kind of relationship with the entity whose data you are consuming or be okay with the fact that everything can just stop working at any random moment in time.

10000truths•6mo ago
That's just part and parcel of relying on third parties - you should always price in the maintenance burden of keeping up with potential changes on their end. That burden is a lot lower if the third party cooperates with you and provides an explicit contract and backwards compatibility, but it's still not zero.
Aaargh20318•6mo ago
It’s not about the maintenance cost, it’s about continuity of service. If you scrape a website things may break at any time. If you use a proper API and have a contract with the supplier you will have the opportunity to make any changes before things break.
hoppp•5mo ago
You download the html, hash it with sha512 , then run the Ai and the webscraping and cache the api content

When the cache is invalidated you refetch the html, check the sha512 hash to see if anything changed then proceed based on yes or no

Or something like that. Its not fast but hashing and comparing is fast compared to inference anyways

Aaargh20318•5mo ago
I’m not sure what that would solve? Your API call is still broken. Best case you’re serving stale data.
Jotalea•6mo ago
It says that the backend is down, I guess I'll have to wait. Hope I don't forget about it before.
p3rls•6mo ago
It's great being an independent site in 2025.

You get fucked by google promoting AIOs and hindustantimes articles for everything in your niche then these scrapers knocking your server offline on the other.