frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

The Grug Brained Developer (2022)

https://grugbrain.dev/
492•smartmic•5h ago•159 comments

Honda conducts successful launch and landing of experimental reusable rocket

https://global.honda/en/topics/2025/c_2025-06-17ceng.html
830•LorenDB•10h ago•248 comments

Resurrecting a dead torrent tracker and finding 3M peers

https://kianbradley.com/2025/06/15/resurrecting-a-dead-tracker.html
371•k-ian•8h ago•109 comments

Bzip2 crate switches from C to 100% Rust

https://trifectatech.org/blog/bzip2-crate-switches-from-c-to-rust/
155•Bogdanp•5h ago•61 comments

3D-printed device splits white noise into an acoustic rainbow without power

https://phys.org/news/2025-06-3d-device-white-noise-acoustic.html
62•rbanffy•2d ago•6 comments

Building Effective AI Agents

https://www.anthropic.com/engineering/building-effective-agents
257•Anon84•8h ago•51 comments

Dinesh's Mid-Summer Death Valley Walk (1998)

https://dineshdesai.info/dv/photos.html
21•wonger_•1h ago•8 comments

AMD's CDNA 4 Architecture Announcement

https://chipsandcheese.com/p/amds-cdna-4-architecture-announcement
112•rbanffy•8h ago•22 comments

LLMs pose an interesting problem for DSL designers

https://kirancodes.me/posts/log-lang-design-llms.html
114•gopiandcode•6h ago•88 comments

Show HN: I made an online Unicode Cuneiform digital clock

https://oisinmoran.com/sumertime
35•OisinMoran•2d ago•10 comments

I Wrote a Compiler

https://blog.singleton.io/posts/2021-01-31-i-wrote-a-compiler/
18•ingve•2d ago•9 comments

Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Lite

https://blog.google/products/gemini/gemini-2-5-model-family-expands/
270•meetpateltech•9h ago•165 comments

Why JPEGs still rule the web (2024)

https://spectrum.ieee.org/jpeg-image-format-history
140•purpleko•11h ago•242 comments

Time Series Forecasting with Graph Transformers

https://kumo.ai/research/time-series-forecasting/
72•turntable_pride•7h ago•26 comments

Now might be the best time to learn software development

https://substack.com/home/post/p-165655726
130•nathanfig•11h ago•78 comments

Foundry (YC F24) Hiring Early Engineer to Build Web Agent Infrastructure

https://www.ycombinator.com/companies/foundry/jobs/azAgJbN-foundry-software-engineer-new-grad-to-mid-level
1•lakabimanil•4h ago

What Google Translate Can Tell Us About Vibecoding

https://ingrids.space/posts/what-google-translate-can-tell-us-about-vibecoding/
85•todsacerdoti•6h ago•54 comments

Iran asks its people to delete WhatsApp from their devices

https://apnews.com/article/iran-whatsapp-meta-israel-d9e6fe43280123c9963802e6f10ac8d1
209•rdrd•6h ago•243 comments

Should we design for iffy internet?

https://bytes.zone/posts/should-we-design-for-iffy-internet/
164•surprisetalk•12h ago•153 comments

Proofs Without Words

https://artofproblemsolving.com/wiki/index.php/Proofs_without_words
4•squircle•3d ago•0 comments

After millions of years, why are carnivorous plants still so small?

https://www.smithsonianmag.com/articles/carnivorous-plants-have-been-trapping-animals-for-millions-of-years-so-why-have-they-never-grown-larger-180986708/
43•gmays•4d ago•18 comments

Tetrachromatic Vision

https://www.bookofjoe.com/2025/05/my-entry-32.html
28•surprisetalk•3d ago•20 comments

The hamburger-menu icon today: Is it recognizable?

https://www.nngroup.com/articles/hamburger-menu-icon-recognizability/
73•thm•11h ago•131 comments

AMD's Pre-Zen Interconnect: Testing Trinity's Northbridge

https://chipsandcheese.com/p/amds-pre-zen-interconnect-testing
101•zdw•3d ago•18 comments

Fujifilm X half: Is it the perfect family camera?

https://arslan.io/2025/06/14/fujifilm-x-half-is-it-the-perfect-family-camera/
47•farslan•3d ago•68 comments

The magic of through running

https://www.worksinprogress.news/p/the-magic-of-through-running
162•ortegaygasset•16h ago•103 comments

US Streetlights Are Turning Purple

https://www.scientificamerican.com/article/streetlights-are-mysteriously-turning-purple-heres-why/
66•surprisetalk•4d ago•77 comments

Voyager: Real-Time Splatting City-Scale 3D Gaussians on Your Phone

https://arxiv.org/abs/2506.02774
44•PaulHoule•12h ago•15 comments

What happens when clergy take psilocybin

https://nautil.us/clergy-blown-away-by-psilocybin-1217112/
332•bookofjoe•1d ago•490 comments

KiCad and Wayland Support

https://www.kicad.org/blog/2025/06/KiCad-and-Wayland-Support/
111•xvilka•15h ago•81 comments
Open in hackernews

Bots are overwhelming websites with their hunger for AI data

https://www.theregister.com/2025/06/17/bot_overwhelming_websites_report/
24•Bender•4h ago

Comments

tartoran•4h ago
RIP internet. It will soon make no sense to share something with the world unless you're in for profit. But who's gonna pay for it?
superkuh•4h ago
While catchy that headline kind of misses the point. It should be "Corporations are overwhelming websites with their hunger for AI data". They're the ones doing it and corporations are by far the most damaging non-human persons (especially since they are formed nowadays to abstract away liability for the damage they cause).

This is not some new enemy "bots". This is the same old non-human legal persons that polluted our physical world repeating things in the digital. Bots run by actual human persons are not the problem.

Analemma_•4h ago
I'm not sure that's true. As hardware gets cheaper, you're going to see more and more people wanting to build+deploy their own personal LLMs to avoid the guardrails/censorship (or just the cost) of the commercial ones, and that means scraping the internet themselves. I suspect the amount of scraping that's coming from individuals or small projects is going to increase dramatically in the months/years to come.
johnea•4h ago
This is an ever growing problem.

The model of the web host paying for all bandwidth was somewhat aligned with traditional usage models, but the wave of scrapping for training data is disrupting this logic.

I remember reading, about 10 years ago?, of how backend website communications (ads and demographic data sharing) had surpassed the bandwidth consumed by actual users. But even in this case, the traffic was still primarily linked to the website hosts.

Whereas with the recent scrapping frenzy the traffic is purely client side, and not initiated by actual website users, and not particularly beneficial to the website host.

One has to wonder what percentage of web traffic now is generated by actual users, versus host backend data sharing, and the mammoth new wave of scrapping.

CSMastermind•4h ago
What's the solution here? Metered usage based on network traffic that gets shared with the website owners?

Otherwise everything moves behind a paywall?

Analemma_•4h ago
For now the solution is proof-of-work systems like Anubis combined with cookie-based rate limiting: you get throttled if your session cookie indicates you scraped here before, and if you throw the cookie out you get the POW challenge again. I don't know how long this will continue to work, but for my site at least it seems to be holding back the deluge, for the moment.
the_snooze•2h ago
>Otherwise everything moves behind a paywall?

Basically. Paywalls and private services. Do things that are anti-scale, because things meant for consumption at scale will inevitably draw parasites.

rglover•4h ago
> Some of the bots identify themselves, but some don't. Either way, the respondents say that robots.txt directives – voluntary behavior guidelines that web publishers post for web crawlers – are not currently effective at controlling bot swarms.

Is anybody tracking the IP ranges of bots or anything similar that's reliable?

It seems like they're taking the "what are you gonna do about it" approach to this.

Edit: Yes [1]

[1] https://github.com/FabrizioCafolla/openai-crawlers-ip-ranges

dbmikus•3h ago
Many bots use residential IP proxy networks, so they come from the same IPs that humans use
josefritzishere•3h ago
I think the solution is criminal penalties.
darekkay•3h ago
ai.robots.txt contains a big list of AI crawlers to block, either through robots.txt or via server rules:

https://github.com/ai-robots-txt/ai.robots.tx

Bender•3h ago
Your link is missing the t at the end of .txt. You should be able to edit it though.
millipede•3h ago
Information is valuable; we just weren't charging for it. AI is just bringing the market for knowledge back into equilibrium.
dehrmann•48m ago
It looks more like information is valuable in aggregate.
gnabgib•3h ago
Original source: https://www.glamelab.org/products/are-ai-bots-knocking-cultu... (https://news.ycombinator.com/item?id=44298771)
pleeb•1h ago
I run a fairly large forum, and I've been getting emails from linode That the CPU usage has been going over 90% multiple times a day, Yours have been complaining that the site has been taking up to five or six seconds to load. I checked the log, and I would keep getting hit with hundreds of connections and second from specific addresses, So I set up rate limiting with Cloudflare.

I thought everything was going well after that, until suddenly it started getting even worse. I started realizing that instead of one IP hitting the site a hundred times per second, it was now hundreds of IP's hitting the site Slightly below the Throttling threshold I had set up.

dehrmann•50m ago
Can you serve cached data to logged-out users?
dehrmann•51m ago
Who's doing this at such a high volume? Most of the data is static enough that there isn't value in frequent crawls, crawls are (probably) more expensive than caching, and small shops and hobbyists don't have the resources to move the needle.