frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Protocol Validation with Affine MPST in Rust

https://hibanaworks.dev
1•o8vm•36s ago•1 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
1•gmays•1m ago•0 comments

Show HN: Zest – A hands-on simulator for Staff+ system design scenarios

https://staff-engineering-simulator-880284904082.us-west1.run.app/
1•chanip0114•2m ago•1 comments

Show HN: DeSync – Decentralized Economic Realm with Blockchain-Based Governance

https://github.com/MelzLabs/DeSync
1•0xUnavailable•7m ago•0 comments

Automatic Programming Returns

https://cyber-omelette.com/posts/the-abstraction-rises.html
1•benrules2•10m ago•1 comments

Why Are There Still So Many Jobs? The History and Future of Workplace Automation [pdf]

https://economics.mit.edu/sites/default/files/inline-files/Why%20Are%20there%20Still%20So%20Many%...
2•oidar•13m ago•0 comments

The Search Engine Map

https://www.searchenginemap.com
1•cratermoon•20m ago•0 comments

Show HN: Souls.directory – SOUL.md templates for AI agent personalities

https://souls.directory
1•thedaviddias•21m ago•0 comments

Real-Time ETL for Enterprise-Grade Data Integration

https://tabsdata.com
1•teleforce•24m ago•0 comments

Economics Puzzle Leads to a New Understanding of a Fundamental Law of Physics

https://www.caltech.edu/about/news/economics-puzzle-leads-to-a-new-understanding-of-a-fundamental...
2•geox•25m ago•0 comments

Switzerland's Extraordinary Medieval Library

https://www.bbc.com/travel/article/20260202-inside-switzerlands-extraordinary-medieval-library
2•bookmtn•26m ago•0 comments

A new comet was just discovered. Will it be visible in broad daylight?

https://phys.org/news/2026-02-comet-visible-broad-daylight.html
2•bookmtn•31m ago•0 comments

ESR: Comes the news that Anthropic has vibecoded a C compiler

https://twitter.com/esrtweet/status/2019562859978539342
1•tjr•32m ago•0 comments

Frisco residents divided over H-1B visas, 'Indian takeover' at council meeting

https://www.dallasnews.com/news/politics/2026/02/04/frisco-residents-divided-over-h-1b-visas-indi...
1•alephnerd•33m ago•0 comments

If CNN Covered Star Wars

https://www.youtube.com/watch?v=vArJg_SU4Lc
1•keepamovin•38m ago•0 comments

Show HN: I built the first tool to configure VPSs without commands

https://the-ultimate-tool-for-configuring-vps.wiar8.com/
2•Wiar8•41m ago•3 comments

AI agents from 4 labs predicting the Super Bowl via prediction market

https://agoramarket.ai/
1•kevinswint•46m ago•1 comments

EU bans infinite scroll and autoplay in TikTok case

https://twitter.com/HennaVirkkunen/status/2019730270279356658
5•miohtama•49m ago•3 comments

Benchmarking how well LLMs can play FizzBuzz

https://huggingface.co/spaces/venkatasg/fizzbuzz-bench
1•_venkatasg•52m ago•1 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
19•SerCe•52m ago•11 comments

Octave GTM MCP Server

https://docs.octavehq.com/mcp/overview
1•connor11528•54m ago•0 comments

Show HN: Portview what's on your ports (diagnostic-first, single binary, Linux)

https://github.com/Mapika/portview
3•Mapika•55m ago•0 comments

Voyager CEO says space data center cooling problem still needs to be solved

https://www.cnbc.com/2026/02/05/amazon-amzn-q4-earnings-report-2025.html
1•belter•59m ago•0 comments

Boilerplate Tax – Ranking popular programming languages by density

https://boyter.org/posts/boilerplate-tax-ranking-popular-languages-by-density/
1•nnx•59m ago•0 comments

Zen: A Browser You Can Love

https://joeblu.com/blog/2026_02_zen-a-browser-you-can-love/
1•joeblubaugh•1h ago•0 comments

My GPT-5.3-Codex Review: Full Autonomy Has Arrived

https://shumer.dev/gpt53-codex-review
2•gfortaine•1h ago•0 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
2•AGDNoob•1h ago•1 comments

God said it (song lyrics) [pdf]

https://www.lpmbc.org/UserFiles/Ministries/AVoices/Docs/Lyrics/God_Said_It.pdf
1•marysminefnuf•1h ago•0 comments

I left Linus Tech Tips [video]

https://www.youtube.com/watch?v=gqVxgcKQO2E
1•ksec•1h ago•0 comments

Program Theory

https://zenodo.org/records/18512279
1•Anonymus12233•1h ago•0 comments
Open in hackernews

Ask HN: Estimation of copyright material used by LLM

6•megamix•3mo ago
1. Is it true that LLMs / AI Companies have used copyrighted material for training?

2. Is it possible to estimate how much of copyrighted material has been used?

Comments

muzani•3mo ago
1. Yes, but it's hard to prove. There are active lawsuits. Some of it has been under "fair use" but at the billion dollar scale, you have to really ask whether it's fair. Also anecdotally, an author friend lamented that her publisher sold the legal rights to use it... it was all perfectly legal but many authors do not agree to this.

2. This is harder as a lot of them don't disclose training sets.

dialup_sounds•3mo ago
I think what you're looking for is not "copyrighted material" but material that's both 1) used without permission and 2) outside the scope of fair use.

There's no easy answer there, hence New York Times v. OpenAI.

MrVandemar•3mo ago
There is an easy answer, it's just obfuscated by powerful people who are benefiting from it an obscene amount, and supported by hoards of addled and thoroughly addicted enthusiasts.

I think sticking a straw in Zlib or AA or LibGen or whatever it is, and drinking until it makes gurgling slurping noises as it hoovers up the dregs at the bottom of the barrel, is far, far removed from “fair use”.

marstall•3mo ago
pretty much everything newer than ~70 years old on the internet is copyrighted, because copywright occurs automatically when you create something (in the US at least). So the answer to #1 is yes.
maxloh•3mo ago
Yeah. The whole point is whether you obtained the data legally. The answer is yes for data scaped from the public internet, but no for pirate contents, hence the Anthropic's lawsuits: https://www-cbsnews-com.cdn.ampproject.org/v/s/www.cbsnews.c...
bjourne•3mo ago
1. Yes 2. No
yincong0822•3mo ago
Large Language Models (LLMs) and AI companies routinely use massive amounts of data for training, much of which is likely to contain copyrighted material.
maxloh•3mo ago
Big Tech companies crawl the internet for training data, which makes it easy for them to download copyrighted data by accident.

For example, most popular textbooks have at least several pirate copies uploaded to the web. Some of them are even in plain sight and Googleable.