frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: ZigZag – A Bubble Tea-Inspired TUI Framework for Zig

https://github.com/meszmate/zigzag
1•meszmate•1m ago•0 comments

Metaphor+Metonymy: "To love that well which thou must leave ere long"(Sonnet73)

https://www.huckgutman.com/blog-1/shakespeare-sonnet-73
1•gsf_emergency_6•3m ago•0 comments

Show HN: Django N+1 Queries Checker

https://github.com/richardhapb/django-check
1•richardhapb•18m ago•1 comments

Emacs-tramp-RPC: High-performance TRAMP back end using JSON-RPC instead of shell

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•todsacerdoti•23m ago•0 comments

Protocol Validation with Affine MPST in Rust

https://hibanaworks.dev
1•o8vm•27m ago•1 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
2•gmays•29m ago•0 comments

Show HN: Zest – A hands-on simulator for Staff+ system design scenarios

https://staff-engineering-simulator-880284904082.us-west1.run.app/
1•chanip0114•30m ago•1 comments

Show HN: DeSync – Decentralized Economic Realm with Blockchain-Based Governance

https://github.com/MelzLabs/DeSync
1•0xUnavailable•34m ago•0 comments

Automatic Programming Returns

https://cyber-omelette.com/posts/the-abstraction-rises.html
1•benrules2•37m ago•1 comments

Why Are There Still So Many Jobs? The History and Future of Workplace Automation [pdf]

https://economics.mit.edu/sites/default/files/inline-files/Why%20Are%20there%20Still%20So%20Many%...
2•oidar•40m ago•0 comments

The Search Engine Map

https://www.searchenginemap.com
1•cratermoon•47m ago•0 comments

Show HN: Souls.directory – SOUL.md templates for AI agent personalities

https://souls.directory
1•thedaviddias•48m ago•0 comments

Real-Time ETL for Enterprise-Grade Data Integration

https://tabsdata.com
1•teleforce•51m ago•0 comments

Economics Puzzle Leads to a New Understanding of a Fundamental Law of Physics

https://www.caltech.edu/about/news/economics-puzzle-leads-to-a-new-understanding-of-a-fundamental...
3•geox•53m ago•0 comments

Switzerland's Extraordinary Medieval Library

https://www.bbc.com/travel/article/20260202-inside-switzerlands-extraordinary-medieval-library
2•bookmtn•53m ago•0 comments

A new comet was just discovered. Will it be visible in broad daylight?

https://phys.org/news/2026-02-comet-visible-broad-daylight.html
3•bookmtn•58m ago•0 comments

ESR: Comes the news that Anthropic has vibecoded a C compiler

https://twitter.com/esrtweet/status/2019562859978539342
2•tjr•59m ago•0 comments

Frisco residents divided over H-1B visas, 'Indian takeover' at council meeting

https://www.dallasnews.com/news/politics/2026/02/04/frisco-residents-divided-over-h-1b-visas-indi...
3•alephnerd•1h ago•3 comments

If CNN Covered Star Wars

https://www.youtube.com/watch?v=vArJg_SU4Lc
1•keepamovin•1h ago•1 comments

Show HN: I built the first tool to configure VPSs without commands

https://the-ultimate-tool-for-configuring-vps.wiar8.com/
2•Wiar8•1h ago•3 comments

AI agents from 4 labs predicting the Super Bowl via prediction market

https://agoramarket.ai/
1•kevinswint•1h ago•1 comments

EU bans infinite scroll and autoplay in TikTok case

https://twitter.com/HennaVirkkunen/status/2019730270279356658
6•miohtama•1h ago•5 comments

Benchmarking how well LLMs can play FizzBuzz

https://huggingface.co/spaces/venkatasg/fizzbuzz-bench
1•_venkatasg•1h ago•1 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
19•SerCe•1h ago•14 comments

Octave GTM MCP Server

https://docs.octavehq.com/mcp/overview
1•connor11528•1h ago•0 comments

Show HN: Portview what's on your ports (diagnostic-first, single binary, Linux)

https://github.com/Mapika/portview
3•Mapika•1h ago•0 comments

Voyager CEO says space data center cooling problem still needs to be solved

https://www.cnbc.com/2026/02/05/amazon-amzn-q4-earnings-report-2025.html
1•belter•1h ago•0 comments

Boilerplate Tax – Ranking popular programming languages by density

https://boyter.org/posts/boilerplate-tax-ranking-popular-languages-by-density/
1•nnx•1h ago•0 comments

Zen: A Browser You Can Love

https://joeblu.com/blog/2026_02_zen-a-browser-you-can-love/
1•joeblubaugh•1h ago•0 comments

My GPT-5.3-Codex Review: Full Autonomy Has Arrived

https://shumer.dev/gpt53-codex-review
2•gfortaine•1h ago•0 comments
Open in hackernews

Web-scraping AI bots cause disruption for scientific databases and journals

https://www.nature.com/articles/d41586-025-01661-4
31•tchalla•8mo ago

Comments

OutOfHere•8mo ago
Requiring PoW (proof-of-work) could take over for simple requests, rejecting requests until a sufficient nonce is included in the request. Unfortunately, this collective PoW could burden power grids even more, wasting energy+money+computation for transmission. Such is life. It would be a lot better to just upgrade the servers, but that's never going to be sufficient.
Bjartr•8mo ago
So, Anubis?

https://anubis.techaro.lol/

OutOfHere•8mo ago
Yes, although the concept is simple enough in principle that a homegrown solution also works.
Zardoz84•8mo ago
We are wasting power on feeding statistics parrots, and we need to waste additional power to avoid being DoS by that feeding.

We will be better without that useless waste of power.

treyd•8mo ago
What do you suppose we as website owners do to prevent our websites from being DoSed in the meantime? And how do you suppose we convince/beg the corporations running AI scraping bots to be better users of the web?
OutOfHere•8mo ago
This should be an easy question for an engineer. It depends on whether the constraint is CPU or memory or database or network.
zihotki•8mo ago
Technology can't solve a human problem, the constraints are in budgets and in available time
OutOfHere•8mo ago
What human problem. Do tell -- how have sites handled search engine crawlers for the past few decades? Why are AI crawlers functionally different? It makes no sense because they aren't functionally different.
OutOfHere•8mo ago
As of this year, AI has given people superpowers, doubling what they can achieve without it. Is this gain not enough? One can use it to run a more efficient web server.
jaoane•8mo ago
Write proper websites that do not choke that easily.
HumanOstrich•8mo ago
So I just need a solution with infinite compute, storage, and bandwidth. Got it.
jaoane•8mo ago
That is not what I said and that is not what is necessary.

First of all web developers should use google and learn what a cache is. That way you don’t need compute at all.

throwawayscrapd•8mo ago
And maybe you could Bing and learn what "cache eviction" is and why that happens when a crawler systematically hits every page on your site.
OutOfHere•8mo ago
Maybe because it's an overly simplistic LRU cache, in which case a different eviction algorithm would be better.

It's funny really since Google and other search engines have been crawling sites for decades, but now that search engines have competition, sites are complaining.

OutOfHere•8mo ago
How did you manage search engine crawlers for the past few decades? And why are AI crawlers functionally different? They aren't.
jakderrida•7mo ago
If I'm being honest... I expect the websites to keep returning errors and have hopes that those that employ you to at least start to understand what's going on.
atonse•8mo ago
How was this not a problem before with search engine crawlers?

Is this more of an issue with having 500 crawlers rather than any single one behaving badly?

Ndymium•8mo ago
Search engine crawlers generally respected robots.txt and limited themselves to a trickle of requests, likely based on the relative popularity of the website. These bots do neither, they will crawl anything they can access and send enough requests per second to drown your server, especially if you're a self hoster running your own little site on a dinky server.

Search engines never took my site down, these bots did.

atonse•8mo ago
Thanks for specifying the actual issue. As someone who hosts a bunch of sites, we're also seeing a spike in traffic, but we don't track user agents.
OutOfHere•8mo ago
Maybe stop using an inefficient PHP/Javascript/Typescript server, and start using a more efficient Go/Rust/Nim/Zig server.
Ndymium•7mo ago
Personally I'm specifically talking about Forgejo which is Go, but calls git for some operations. And the effect that was worse than pegging all the CPUs to 100% was filling of the disk with generated zip archives of all of the commits of all public repositories.

Sure, we can say that Forgejo should have had better defaults for this (the default was to clear archives after 24 hours). And that your site should be fast, run on an efficient server, and not have any even slightly expensive public endpoints. But in the end that is all victim blaming.

One of the nice parts of the web for me is that as long as I have a public IP address, I can use any dinky cheapo server I have and run my own infra on it. I don't need to rely on big players to do this for me. Sure, sometimes there's griefers/trolls out there, but generally they don't bother you. No one was ever interested in my little server, and search engines played fair (and to my knowledge still do) while still allowing my site to be discoverable.

Dealing with these bots is the first time my server has been consistently attacked. I can deal with them for now, but it is an additional thing to deal with and suddenly this idea of easy self hosting on low powered hardware is no longer so feasible. That makes me sad. I know what I should do about it, but I wish I didn't have to.

OutOfHere•7mo ago
It is why I require authorization for expensive endpoints. Everything else can often be just an inexpensive cache hit.
fogx•8mo ago
esp. for image data libraries, why not provide the images as a dump instead? No need to crawl 3mil images if the download button is right there. Now put the file on a cdn or Google and you're golden
HumanOstrich•8mo ago
There are two immediate issues I see with that. First, you'll end up with bots downloading the dump over and over again. Second, for non-trivial amounts of data, you'll end up paying the CDN for bandwidth anyway.
throwawayscrapd•8mo ago
I work on the kind of big online scientific database that this article is about.

100% of our data is available from a clearly marked "Download" page.

We still have scraper bots running through the whole site constantly.

We are not "golden".