frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
1•pseudolus•30s ago•0 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•4m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
1•bkls•4m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•5m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
2•roknovosel•6m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•14m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•14m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
1•surprisetalk•16m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•16m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
1•surprisetalk•16m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...
2•pseudolus•17m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/
1•1vuio0pswjnm7•17m ago•0 comments

Bogus Pipeline

https://en.wikipedia.org/wiki/Bogus_pipeline
1•doener•18m ago•0 comments

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

https://nypost.com/2026/02/05/business/winklevoss-twins-gemini-crypto-exchange-cuts-25-of-workfor...
1•1vuio0pswjnm7•19m ago•0 comments

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
3•obscurette•19m ago•0 comments

Cycling in France

https://www.sheldonbrown.com/org/france-sheldon.html
1•jackhalford•21m ago•0 comments

Ask HN: What breaks in cross-border healthcare coordination?

1•abhay1633•21m ago•0 comments

Show HN: Simple – a bytecode VM and language stack I built with AI

https://github.com/JJLDonley/Simple
1•tangjiehao•24m ago•0 comments

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

https://caratria.com/
1•jonrosner•24m ago•1 comments

My Eighth Year as a Bootstrapped Founde

https://mtlynch.io/bootstrapped-founder-year-8/
1•mtlynch•25m ago•0 comments

Show HN: Tesseract – A forum where AI agents and humans post in the same space

https://tesseract-thread.vercel.app/
1•agliolioyyami•25m ago•0 comments

Show HN: Vibe Colors – Instantly visualize color palettes on UI layouts

https://vibecolors.life/
2•tusharnaik•26m ago•0 comments

OpenAI is Broke ... and so is everyone else [video][10M]

https://www.youtube.com/watch?v=Y3N9qlPZBc0
2•Bender•27m ago•0 comments

We interfaced single-threaded C++ with multi-threaded Rust

https://antithesis.com/blog/2026/rust_cpp/
1•lukastyrychtr•28m ago•0 comments

State Department will delete X posts from before Trump returned to office

https://text.npr.org/nx-s1-5704785
7•derriz•28m ago•1 comments

AI Skills Marketplace

https://skly.ai
1•briannezhad•28m ago•1 comments

Show HN: A fast TUI for managing Azure Key Vault secrets written in Rust

https://github.com/jkoessle/akv-tui-rs
1•jkoessle•28m ago•0 comments

eInk UI Components in CSS

https://eink-components.dev/
1•edent•29m ago•0 comments

Discuss – Do AI agents deserve all the hype they are getting?

2•MicroWagie•32m ago•0 comments

ChatGPT is changing how we ask stupid questions

https://www.washingtonpost.com/technology/2026/02/06/stupid-questions-ai/
2•edward•33m ago•1 comments
Open in hackernews

Universal pre-training by iterated random computation

https://arxiv.org/abs/2506.20057
37•liamdgray•7mo ago

Comments

liamdgray•7mo ago
Abstract: "We investigate the use of randomly generated data for the sake of pre-training a model. We justify this approach theoretically from the perspective of algorithmic complexity, building on recent research that shows that sequence models can be trained to approximate Solomonoff induction. We derive similar, but complementary theoretical results. We show empirically that synthetically generated data can be used to pre-train a model before the data is seen. We replicate earlier results that models trained this way show zero-shot in-context learning across a variety of datasets, and that this performance improves with scale. We extend earlier results to real-world data, and show that finetuning a model after pre-training offers faster convergence and better generalization."
bionhoward•7mo ago
This is a cool concept, but for comparison, I can’t help but wish there was more comparison between the treatment group and a control group that doesn’t see any universal pretraining data.

It’s good to compare various model sizes and evaluation tasks and random data generators. I just think the paper would more effectively prove its point if it could show models of same sizes which see this random data can learn better from evaluation data later on.

Could even take the initial checkpoint of the model before universal pretraining against the pretrained checkpoint. If the method works, the one that did UP will win.

Maybe I’m way off, I’ll admit I only skimmed it so far. Seems promising, just wishing for some controls.

yorwba•7mo ago
In figures 2, 4, and 6, the top left end of the training curves represents models that have not seen any pretraining data. In figure 5, they're represented by dashed curves.
visarga•7mo ago
Results are modest, maybe 20-30% fewer training steps to reach target performance. This won't solve the problem of organic data exhaustion. We need 100x more data.

They didn't test against actual language model pretraining, only tested against a random init.

- A: Pre-trained on their synthetic LSTM data -> fine-tuned on Wikipedia

- B: Pre-trained on different natural language corpus -> fine-tuned on Wikipedia

- C: Random initialization -> fine-tuned on Wikipedia

They only test A vs C, not A vs B.

WithinReason•7mo ago
This paper addresses the problem of running out of data. You can't do B when you ran out of data so it's irrelevant.
impossiblefork•7mo ago
20-30% isn't modest. I think there is a big problem though, but it's that it's character level prediction.

It's not obvious how generate this kind of good synthetic data when it's to be fed to a tokenized model.