frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Chrome MCP: Open-source plugin to let any chatbot control your Chrome

https://github.com/hangwin/mcp-chrome
1•hangye•2m ago•0 comments

Michael Truell(CEO, Cursor) on betting everything on a world beyond code

https://twitter.com/ycombinator/status/1932801405229953329
1•babushkaboi•3m ago•0 comments

Disney, Universal File First Major Studio Lawsuit Against AI Company Midjourney

https://variety.com/2025/digital/news/disney-nbcuniversal-studio-lawsuit-ai-midjourney-copyright-infringement-1236428188/
1•lastdong•4m ago•0 comments

Quantity Kills

https://iainmcgilchrist.substack.com/p/quantity-kills
1•walterbell•5m ago•0 comments

Is the 'tech bro-ification' of abortion here?

https://prismreports.org/2025/06/11/abortion-tech-repro-workers/
1•Improvement•5m ago•0 comments

The Highest Form of Culinary Reverence: Ikizukuri

https://wami-japan.com/article/2381/
1•Ch00k•11m ago•0 comments

GM's silent about-face from EV production after losing $6B

https://www.carsandhorsepower.com/news/profit-over-prophecy-how-gm-s-6-billion-income-drop-forced-its-ev-retreat
2•Anumbia•11m ago•1 comments

Text-to-LoRA: Hypernetwork that generates task-specific LLM adapters (LoRAs)

https://github.com/SakanaAI/text-to-lora
2•dvrp•12m ago•1 comments

Text-to-LoRA: Hypernetwork that generates task-specific LLM adapters (LoRAs)

https://twitter.com/SakanaAILabs/status/1932972420522230214
2•dvrp•12m ago•0 comments

EU launches EU-based, privacy-focused DNS resolution service

https://www.helpnetsecurity.com/2025/06/09/eu-launches-eu-based-privacy-focused-dns-resolution-service/
6•stanislavb•16m ago•0 comments

Darknet Market Maximalism

https://antimoonboy.com/darknetmarketmaximalism/
1•opengears•16m ago•0 comments

Navy backs right to repair after $13B carrier goes half-fed

https://www.theregister.com/2025/06/11/us_navy_repair/
3•beardyw•28m ago•0 comments

Before the far right threatened democracy, neoliberalism stripped it down

https://www.policyalternatives.ca/news-research/before-the-far-right-threatened-democracy-neoliberalism-stripped-it-down/
1•bryanrasmussen•30m ago•0 comments

The Law of the Excluded Middle

https://en.wikipedia.org/wiki/Law_of_excluded_middle
1•sans_souse•30m ago•1 comments

A Comprehensive Look at In-Ear EEG Electrodes and Their Applications

https://www.mdpi.com/1424-8220/25/11/3321
1•PaulHoule•32m ago•0 comments

Brian Wilson has passed away

https://www.bbc.com/news/articles/cg71xrxrn8go
1•rock_artist•33m ago•1 comments

Ilya Sutskever Speech: Honorary Doctor of Science from University of Toronto [video]

https://www.youtube.com/watch?v=zuZ2zaotrJs
1•znq•34m ago•0 comments

OsmAnd's Faster Offline Navigation

https://osmand.net/blog/fast-routing/
2•electricant•35m ago•1 comments

AOSP project is coming to an end

https://old.reddit.com/r/StallmanWasRight/comments/1l8rhon/aosp_project_is_coming_to_an_end/
17•kaladin-jasnah•35m ago•1 comments

The Centralization of the Internet

https://www.thepublicdiscourse.com/2021/08/77139/
4•vishnumohandas•43m ago•0 comments

Fulbright Board Resigns After Accusing Trump Aides of Political Interference

https://www.nytimes.com/2025/06/11/us/politics/fulbright-board-resign-trump.html
2•mcyc•43m ago•0 comments

Expanding Racks [video]

https://www.youtube.com/watch?v=iWknov3Xpts
27•doctoboggan•1h ago•2 comments

Lysenkoism

https://en.wikipedia.org/wiki/Lysenkoism
1•downboots•1h ago•0 comments

Ubuntu 25.10 Replaces Sudo with a Rust-Based Equivalent

https://thenewstack.io/ubuntu-25-10-replaces-sudo-with-a-rust-based-equivalent/
1•90s_dev•1h ago•1 comments

Continental divide: Smaller Western European cities are better for your health

https://www.politico.eu/article/small-western-european-cities-healthy-urban-design-index-mobility-green-space/
1•XzetaU8•1h ago•0 comments

9-figure packages for AI Researchers at Meta

https://www.axios.com/2025/06/10/meta-ai-superintelligence-zuckerberg
4•fizx•1h ago•2 comments

Concept-Centric Software Development (2023)

https://arxiv.org/abs/2304.14975
1•feifan•1h ago•0 comments

In case of emergency, break glass

https://morrick.me/archives/10048
2•microflash•1h ago•0 comments

Ubuntu 25.10 drops support for GNOME on Xorg

https://discourse.ubuntu.com/t/ubuntu-25-10-drops-support-for-gnome-on-xorg/62538
3•baobun•1h ago•0 comments

Cathemerality

https://en.wikipedia.org/wiki/Cathemerality
3•thunderbong•1h ago•0 comments
Open in hackernews

Web-scraping AI bots cause disruption for scientific databases and journals

https://www.nature.com/articles/d41586-025-01661-4
30•tchalla•1d ago

Comments

OutOfHere•1d ago
Requiring PoW (proof-of-work) could take over for simple requests, rejecting requests until a sufficient nonce is included in the request. Unfortunately, this collective PoW could burden power grids even more, wasting energy+money+computation for transmission. Such is life. It would be a lot better to just upgrade the servers, but that's never going to be sufficient.
Bjartr•1d ago
So, Anubis?

https://anubis.techaro.lol/

OutOfHere•1d ago
Yes, although the concept is simple enough in principle that a homegrown solution also works.
Zardoz84•1d ago
We are wasting power on feeding statistics parrots, and we need to waste additional power to avoid being DoS by that feeding.

We will be better without that useless waste of power.

treyd•1d ago
What do you suppose we as website owners do to prevent our websites from being DoSed in the meantime? And how do you suppose we convince/beg the corporations running AI scraping bots to be better users of the web?
OutOfHere•1d ago
This should be an easy question for an engineer. It depends on whether the constraint is CPU or memory or database or network.
zihotki•23h ago
Technology can't solve a human problem, the constraints are in budgets and in available time
OutOfHere•5h ago
What human problem. Do tell -- how have sites handled search engine crawlers for the past few decades? Why are AI crawlers functionally different? It makes no sense because they aren't functionally different.
OutOfHere•3h ago
As of this year, AI has given people superpowers, doubling what they can achieve without it. Is this gain not enough? One can use it to run a more efficient web server.
jaoane•23h ago
Write proper websites that do not choke that easily.
HumanOstrich•17h ago
So I just need a solution with infinite compute, storage, and bandwidth. Got it.
jaoane•16h ago
That is not what I said and that is not what is necessary.

First of all web developers should use google and learn what a cache is. That way you don’t need compute at all.

throwawayscrapd•15h ago
And maybe you could Bing and learn what "cache eviction" is and why that happens when a crawler systematically hits every page on your site.
OutOfHere•5h ago
Maybe because it's an overly simplistic LRU cache, in which case a different eviction algorithm would be better.

It's funny really since Google and other search engines have been crawling sites for decades, but now that search engines have competition, sites are complaining.

OutOfHere•5h ago
How did you manage search engine crawlers for the past few decades? And why are AI crawlers functionally different? They aren't.
atonse•1d ago
How was this not a problem before with search engine crawlers?

Is this more of an issue with having 500 crawlers rather than any single one behaving badly?

Ndymium•1d ago
Search engine crawlers generally respected robots.txt and limited themselves to a trickle of requests, likely based on the relative popularity of the website. These bots do neither, they will crawl anything they can access and send enough requests per second to drown your server, especially if you're a self hoster running your own little site on a dinky server.

Search engines never took my site down, these bots did.

atonse•14h ago
Thanks for specifying the actual issue. As someone who hosts a bunch of sites, we're also seeing a spike in traffic, but we don't track user agents.
OutOfHere•5h ago
Maybe stop using an inefficient PHP/Javascript/Typescript server, and start using a more efficient Go/Rust/Nim/Zig server.
fogx•23h ago
esp. for image data libraries, why not provide the images as a dump instead? No need to crawl 3mil images if the download button is right there. Now put the file on a cdn or Google and you're golden
HumanOstrich•17h ago
There are two immediate issues I see with that. First, you'll end up with bots downloading the dump over and over again. Second, for non-trivial amounts of data, you'll end up paying the CDN for bandwidth anyway.
throwawayscrapd•16h ago
I work on the kind of big online scientific database that this article is about.

100% of our data is available from a clearly marked "Download" page.

We still have scraper bots running through the whole site constantly.

We are not "golden".