frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Open Source 1.7tb Dataset of What AI Crawlers Are Doing

https://huggingface.co/datasets/lee101/webfiddle-internet-raw-cache-dataset
10•catsanddogsart•8h ago

Comments

jauntywundrkind•6h ago
This potentially is so awesome!

In the submission on Cloudflare adding AI blocking, one of my asks was for better tools to donate limiting (rather than add client pain with Anubis). The AI crawlers are alleged to be pretty merciless about changing their identity (IP address, user agent) if rate limited, but by having data sets like this, I feel like we stand a chance of building tools to analyze the behavior and being able to build rate limiter systems that can still function against these adversarial forces (without penalizing regular users). https://news.ycombinator.com/item?id=44443480

It'd be awesome if we had an http spec alike GitHub's rate limit headers, so that we could just tell crawlers what we'll grace them. Sure many crawlers would ignore it or try to bypass it. But there should in principle be some means for cooperation, should be a way to say what you will allow! We should be trying to coax food behaviors, but there's no protocols to set bounds for what good is. GitHub's done real good here, imo, and something like this should be enshrined, to hopefully help get server loads back to reasonable levels, to let the calm be enhanced.

Jury says Google must pay California Android smartphone users $314.6M

https://www.theguardian.com/us-news/2025/jul/01/google-california-android-smartphone
1•Brajeshwar•1m ago•0 comments

Jakarta EE 11 Delivers 16 Updated Specifications and Modernized TCK

https://www.infoq.com/news/2025/07/jakarta-ee-11-updates/
1•henk53•2m ago•0 comments

Hundreds of Brother printer models have an unpatchable security flaw

https://www.theverge.com/news/694877/brother-printers-security-flaw-password-vulnerability
1•susam•10m ago•0 comments

How to manage configuration settings in Go web applications

https://www.alexedwards.net/blog/how-to-manage-configuration-settings-in-go-web-applications
2•todsacerdoti•12m ago•0 comments

Take Two: Eshell

http://yummymelon.com/devnull/take-two-eshell.html
1•nanna•13m ago•0 comments

Perplexity joins Anthropic and OpenAI in offering a $200 per month subscription

https://www.engadget.com/ai/perplexity-joins-anthropic-and-openai-in-offering-a-200-per-month-subscription-191715149.html
1•Brajeshwar•16m ago•0 comments

Digital Hygiene: Emails

https://herman.bearblog.dev/digital-hygiene-emails/
1•HermanMartinus•18m ago•0 comments

Space Force to fund development of Atomic-6 solar power for satellites

https://spacenews.com/space-force-to-fund-development-of-atomic-6-solar-power-for-satellites/
2•rbanffy•21m ago•0 comments

Trump tries to kill the most indisputable evidence of climate change

https://www.cnn.com/2025/07/01/climate/trump-cuts-mauna-loa-keeling
2•doener•26m ago•0 comments

A nuclear attack on the U.S. might unfold, step by step

https://www.washingtonpost.com/opinions/interactive/2025/nuclear-attack-washington-scenario/
1•phtrivier•26m ago•1 comments

Laptop Mag is shutting down

https://www.theverge.com/news/695969/laptop-mag-shutdown-future-plc
1•rbanffy•27m ago•0 comments

Show HN: Bookmark and organise your mobile links with ease with this free app

https://about.listee.app
2•MLJV•27m ago•1 comments

Albumentations: Licensing Change and Project Fork

https://albumentations.ai/blog/2025/01-albumentationsx-dual-licensing/
1•ternaus•28m ago•1 comments

Recreating Laravel Cloud's range input with native HTML

https://phare.io/blog/recreating-laravel-clouds-range-input-with-native-html/
1•Bogdanp•28m ago•0 comments

When do pattern match compilation heuristics matter?

https://www.cs.tufts.edu/~nr/pubs/match-abstract.html
1•fanf2•29m ago•0 comments

Guaranteeing post-quantum encryption in the browser: ML-KEM over WebSockets

https://blog.projecteleven.com/posts/guaranteeing-post-quantum-encryption-in-the-browser-ml-kem-over-websockets
1•nuggimane•30m ago•0 comments

Trusting the Boot Process: Inside Bottlerocket's Security Architecture

https://molnett.com/blog/25-06-30-trusting-the-boot-process
1•bittermandel•30m ago•0 comments

2-D Digital Waveguide and Finite Difference Modeling of a Sitar (2015) [pdf]

https://www.ripublication.com/ijaer10/ijaerv10n11_122.pdf
1•brudgers•33m ago•0 comments

Tenkai: AI-powered no-code platform to extract structured web data from any site

https://tenkai.tech
1•nikosep•44m ago•0 comments

Show HN: Open Dog Registry – free, open-source API for 200 dog breeds

2•chase-manning•45m ago•0 comments

All Rocket launches in 2025 so far, chronologically and to scale

https://old.reddit.com/r/SpaceXLounge/comments/1lnapew/all_rocket_launches_in_2025_so_far/
1•nomilk•47m ago•0 comments

ChatGPT creates phisher's paradise by recommending the wrong URLs

https://www.theregister.com/2025/07/03/ai_phishing_websites/
1•chrisjj•47m ago•2 comments

Nintendo locked down the Switch 2's USB-C port and broke third-party docking

https://www.theverge.com/report/695915/switch-2-usb-c-third-party-docks-dont-work-authentication-encryption
1•01-_-•54m ago•0 comments

Takens Embedding Theorem

https://en.wikipedia.org/wiki/Takens%27s_theorem
1•niemandhier•56m ago•1 comments

Apple is reportedly working on a cheaper MacBook with an iPhone processor

https://www.zdnet.com/article/apple-reportedly-working-on-a-cheaper-macbook-with-an-iphone-processor-why-that-makes-sense-to-do/
3•01-_-•57m ago•0 comments

Show HN: Managing VectorDB via Natural Language

https://github.com/zilliztech/zilliz-mcp-server
1•Fendy•59m ago•0 comments

The Koka Programming Language

https://koka-lang.github.io/koka/doc/index.html
1•ColinWright•1h ago•0 comments

Show HN: VDBbench 1.0: open-source benchmarking for VectorDBs

https://github.com/zilliztech/VectorDBBench
1•Fendy•1h ago•0 comments

Monkeys, typewriters, and busy beavers

https://lcamtuf.substack.com/p/monkeys-typewriters-and-busy-beavers
1•rbanffy•1h ago•0 comments

Why Are SaaS Boilerplates Still This Expensive? So I Built My Own

2•Shreyan19•1h ago•0 comments