frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
1•ShinyaKoyano•3m ago•0 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
1•m00dy•4m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•5m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
1•okaywriting•12m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
1•todsacerdoti•14m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•15m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•16m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•17m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•17m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•18m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•18m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•22m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•22m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•23m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•23m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•32m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•32m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
1•surprisetalk•34m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•34m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
2•surprisetalk•34m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...
5•pseudolus•35m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/
1•1vuio0pswjnm7•35m ago•0 comments

Bogus Pipeline

https://en.wikipedia.org/wiki/Bogus_pipeline
1•doener•36m ago•0 comments

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

https://nypost.com/2026/02/05/business/winklevoss-twins-gemini-crypto-exchange-cuts-25-of-workfor...
2•1vuio0pswjnm7•37m ago•0 comments

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
3•obscurette•37m ago•0 comments

Cycling in France

https://www.sheldonbrown.com/org/france-sheldon.html
2•jackhalford•38m ago•0 comments

Ask HN: What breaks in cross-border healthcare coordination?

1•abhay1633•39m ago•0 comments

Show HN: Simple – a bytecode VM and language stack I built with AI

https://github.com/JJLDonley/Simple
2•tangjiehao•41m ago•0 comments

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

https://caratria.com/
1•jonrosner•42m ago•1 comments

My Eighth Year as a Bootstrapped Founde

https://mtlynch.io/bootstrapped-founder-year-8/
1•mtlynch•43m ago•0 comments
Open in hackernews

Do Not Train" Meta Tags: The Robots.txt of AI – Will Anyone Respect Them?

5•alissa_v•9mo ago
I've been noticing more creators and platforms quietly adding things like <meta name="robots" content="noai"> to their pages - kind of like a robots.txt, but for LLMs. For those unfamiliar, robots.txt is a standard file websites use to tell search engines which pages they shouldn't crawl. These new "noai" tags serve a similar purpose, but for AI training models instead of search crawlers.

Some examples of platforms implementing these opt-out mechanisms: - Sketchfab now offers creators an option to block AI training in their account settings - DeviantArt pioneered these tags as part of their content protection approach - ArtStation added both meta tags and updated their Terms of Service - Shutterstock created a compensation model for contributors whose images are used in AI training

But here's where things get concerning - there's growing evidence these tags are being treated as optional suggestions rather than firm boundaries:

- Various creators have reported issues with these tags being ignored. For instance, a discussion on DeviantArt (https://www.deviantart.com/lumaris/journal/NoAI-meta-tag-is-NOT-honored-by-DA-941468316) documents cases where the tags weren't honored, with references to GitHub conversations showing implementation issues

- In a GitHub pull request for an image dataset tool (https://github.com/rom1504/img2dataset/pull/218), developers made respecting these tags optional rather than default, which one commenter described as having "gutted it so that we can wash our hands of responsibility without actually respecting anyone's wishes"

- Raptive Support, a company implementing these tags, admits they "are not yet an industry standard, and we cannot guarantee that any or all bots will respect them" (https://help.raptive.com/hc/en-us/articles/13764527993755-NoAI-Meta-Tag-FAQs)

- A proposal to the HTML standards body (https://github.com/whatwg/html/issues/9334) acknowledges these tags don't enforce consent and compliance "might not happen short of robust regulation"

Some creators have become so cynical that one prominent artist David Revoy announced they're abandoning tags like #NoAI because "the damage has already been done" and they "can't remove [their] art one by one from their database." (https://www.davidrevoy.com/article977/artificial-inteligence-why-i-ll-not-hashtag-my-art-humanart-humanmade-or-noai)

This raises several practical questions:

- Will this actually work in practice without enforcement mechanisms?

- Could it be legally enforceable down the line?

- Has anyone successfully used these tags to prevent unauthorized training?

Beyond the technical implementation, I think this points to a broader conversation about creator consent in the AI era. Is this more symbolic - a signal that people want some version of "AI consent" for the open web? Or could it evolve into an actual standard with teeth?

I'm curious if folks here have added something like this to their own websites or content. Have you implemented any technical measures to detect if your content is being used for training anyway? And for those working in AI: what's your take on respecting these kinds of opt-out signals?

Would love to hear what others think.

Comments

abhisek•9mo ago
I am not sure how this is any different from open source code being embedded in commercial applications. It’s really like a self-accelerating loop.

At least for OSS, usage defines value. When an OSS project is popular, enterprises notices it and begins to use it in their commercial applications.

alissa_v•9mo ago
I agree with your point about usage defining value in OSS - popular projects gain recognition, contributions, and opportunities through their adoption in commercial applications.

The critical difference, though, is consent. OSS creators explicitly choose licenses permitting commercial use - they opt in to sharing their work. Many content creators never made such a choice for AI training.

The current AI training paradigm doesn't even have a true opt-out model - it simply assumes everything is available. The noAI tags are attempting to create an opt-out mechanism where none previously existed. Without enforcement or standards adoption, though, these signals don't seem to have the same weight as established open source licenses.

There's also a significant difference in attribution. OSS creators receive clear attribution even when their work is used commercially. For creators whose work trains AI models, their contribution is blended and anonymized with no recognition pathway.

The core question is whether creating this opt-out approach is sufficient, or if AI training should move toward an opt-in model more similar to how open source licensing works.

BobbyTables2•9mo ago
No
alissa_v•9mo ago
Haha fair enough! Any particular reason why you think they won't be respected?
zzo38computer•9mo ago
I do not want others to scrape my files from my server for the purpose of training LLMs, but if they acquire a copy of them by other means or already have a copy of them for other reasons, then they will already have a copy and then they can do what they want with it.

I do not care about attribution; but I care more that they do not claim additional restrictions in their terms of use when they copy my stuff and use it.

nicbou•9mo ago
They already started with the assumption of consent, crawled the web with disregard for resource use, and still provide no mechanism to revoke permission. This is the culture around AI. A quiet little tag that says "please don't do that" won't do much.

These companies are already behaving like jerks. Do you think they will become more polite once they control how we avcess information? with investors breathing down their neck?

Ukv•9mo ago
Of the signals used to indicate crawling is prohibited, robots.txt is probably the most effective; OpenAI, Google, Anthropic, Meta, and CommonCrawl all claim to respect it. That often provokes a response of "well they're lying", but I've yet to actually find any cases of the IPs they use for crawling accessing content prohibited by robots.txt.

Newly proposed standards will probably take a while to catch on, if they ever do.

Not a lawyer, but I believe such measures could in theory become legally enforceable in the US without any new legislation if the fair use defense fails but an implied license defense (the reason you can cache/rehost copies of webpages that don't have a <noarchive> meta tag, as in Field v. Google Inc) succeeds.