frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•59s ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
1•Brajeshwar•5m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
2•Brajeshwar•5m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
1•Brajeshwar•5m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•8m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•11m ago•0 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•12m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•12m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
2•vinhnx•13m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•18m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•23m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•27m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•28m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•29m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
5•okaywriting•36m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•38m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•39m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•40m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•41m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•41m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•42m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•42m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•46m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•46m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•47m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•48m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•56m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•56m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
2•surprisetalk•58m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•58m ago•0 comments
Open in hackernews

LoRA Without Regret

https://thinkingmachines.ai/blog/lora/
184•grantpitt•4mo ago

Comments

Yenrabbit•4mo ago
Thinking Machines have put out a string of incredibly high-quality posts lately. Hard to oversell how much cred it's buying them with the AI research community! Keep up the great work folks
sudohalt•4mo ago
[flagged]
dang•4mo ago
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

mijoharas•4mo ago
What else has there been. I've only seen this one (which is great!)
joloooo•4mo ago
Their Defeating Nondeterminism in LLM Inference was interesting for me. Worth reading their others!
_def•4mo ago
Took me a moment to realize this is not about LoRa.
ellisv•4mo ago
I also mistook it to be about LoRa and not about LoRA
chrystalkey•4mo ago
I too fell victim to mistaking LoRa for LoRa
logannyeMD•4mo ago
Missed opportunity to title this "Lo-RAgrets"
HumblyTossed•4mo ago
The name gets me every single time. Always think it’s going to be about radio LoRa
dannyfritz07•4mo ago
Dang it! Got me too! I've been wanting to hop into Meshtastic lately.
ijustlovemath•4mo ago
Set up a node! Bare boards that work with the app are like $50 and take a few clicks to flash and setup. The basic antenna with no amp makes contacts up to 50mi away if the conditions are right. I have one in a window and one in a backpack at all times.
jacquesm•4mo ago
It's insane how far you can go between hops, really most impressive. Where I live the mesh density is fairly high but I've also tried it in places where it was vanishingly low and yet I never completely lost contact. LoRa is very much an underappreciated technology.
wkjagt•4mo ago
I have a couple of nodes up, but not seeing a lot of traffic
mrandish•4mo ago
Yeah, kinda disappointed it's just more AI stuff...
canadiantim•4mo ago
I thought it was Lora the CRTD implementation, but then realized that Loro
halfmatthalfcat•4mo ago
Same - sad it's not.
moffkalast•4mo ago
No such thing as LoRa and LoRaWAN without regret I'm afraid, all the range but no throughput.
halfmatthalfcat•4mo ago
You can do a lot with 255 bytes (SF5-8), just have to be creative :)
sifar•4mo ago
And I thought you were going to say thinking machines :). Buy yeah LoRA trips me up too.
papascrubs•4mo ago
Not just me then. It's always the first thing that springs to mind.
apple4ever•4mo ago
Nope not just you! Gets me everytime.
CaptainOfCoit•4mo ago
Microsofts inability to properly name things once again introduces more confusion than clarity, thanks Microsoft :)

At this point I think they do it on purpose, as their metrics for "people visiting the website/repository" or whatever gets increased as people thinking the repository is about the existing concept/technology.

dvfjsdhgfv•4mo ago
By the way, some time ago when I checked there were two cool applications of LoRa: (1) a mesh, for (hopefully) truly decentralized and more difficult to disrupt communication, (2) a gateway, so that you could get data from your sensors in remote places via standard internet protocols.

Both are very cool, but I wonder if I missed something else?

eagsalazar2•4mo ago
stupid website hijackes cmd-back-arrow.
markisus•4mo ago
Can someone explain the bit counting argument in the reinforcement learning part?

I don’t get why a trajectory would provide only one bit of information.

Each step of the trajectory is at least giving information about what state transitions are possible.

An infinitely long trajectory can explore the whole state space if there are no absorbing states. Such a trajectory would provide a massive amount of information about the system, even if we ignored the final reward.

mountainriver•4mo ago
A fair amount of research has shown that RL doesn’t add knowledge to the base model it just optimizes paths that already exist. Now ProRL from Nvidia showed there are ways of adding knowledge, mostly through progressive merging.

I’m still not fully convinced of the 1bit claim, they made other mistakes in the blog post

navar•4mo ago
I believe it's because the way you measure things in RL, each episode only tells you whether it was good (say reward +1) or bad (say 0 or negative reward), it does not tell you anything about the trace that was produced to get the outcome. This reward is the only thing measured to produce your gradients. Hence why the amount of info in it is O(1).

This is in contrast to more "supervised" forms of learning where you could get a loss for each token produced (e.g. cross entropy loss), and where you'd get, as a consequence O(number of tokens) information into your gradients.

mountainriver•4mo ago
> LoRA works well when not capacity constrained, i.e., the number of trainable parameters exceeds the amount of information to be learned, which can be estimated in terms of dataset size

I’m shocked they didn’t look at progressive merging of LoRAs. Research shows that’s the best way of improving its ability to model higher level features.

Seems like a massive miss, not to mention there is other research that contradicts a lot of their findings. This feels a bit like a researchers first pass at learning LoRA

yenepho•4mo ago
I am curious, would you mind sharing a citation?
Mkengin•4mo ago
https://arxiv.org/abs/2311.13600

https://arxiv.org/abs/2410.22911

https://arxiv.org/abs/2409.16167

mountainriver•4mo ago
Don’t forget ReLoRA! https://arxiv.org/abs/2307.05695
let_tim_cook_•4mo ago
I'm not sure why progressive LoRa merging needs to be addressed here. They show there is a regime of problem where LoRa performs equivalently to FFT.

Progressive merging of LoRa is somewhere inbetween and categorically more complex than just LoRa so would be dominated by standard LoRa in that case.

While progressive merging could train faster as fewer params are trainable at any given time, it results in very larger adapter diffs OTO the size of the original model and doesn't retain the benefits of being able to deploy multiple adapters over the same base model idt.

raaron773•4mo ago
The amount of people who mistook this for long range radio and were disappointed when it isnt about it is way too damn high. (This is including me)
ineedasername•4mo ago
It might be useful to use this thread in a dataset to train a LoRa so that LLM agents can more easily disambiguate the great LoRa acronym collision of ‘25. No longer will future generations suffer the indignity of either/or/both confusions.
kouteiheika•4mo ago
> However, the literature is unclear on how well LoRA performs relative to FullFT.

I think the literature is clear on that?

"LoRA vs Full Fine-tuning: An Illusion of Equivalence" -- https://arxiv.org/abs/2410.21228v1

Quoting from the conclusions:

> The paper describes the finding that LoRA and full fine-tuning, with equal performance on the fine-tuning task, can have solutions with very different generalization behaviors outside the fine-tuning task distribution. We found that LoRA and full fine-tuning yield models with significant differences spectral properties of their weight matrices: LoRA models often containing “intruder dimensions”, high-ranking singular vectors approximately orthogonal to the singular vectors of pre-trained weight matrices. The existence of intruder dimensions correlates with the fine-tuned model forgetting more of the pre-training distribution as well as forgetting more when trained on tasks sequentially in a continual learning setup.

I'm surprised they didn't cite this; it's a well known paper.

adhi01•4mo ago
To say that the 'literature is clear on that' while citing a single paper, which has been rejected from ICLR, is a bit of an overstatement.
muragekibicho•4mo ago
Thanks for this comment.
kouteiheika•4mo ago
> which has been rejected from ICLR

Oh, you mean rejected just like these papers?

Efficient Estimation of Word Representations in Vector Space[1], one of the most influential papers in the space with tens of thousands of citations[2]? Or the RoBERTa[3] paper (dramatically improved upon BERT; RoBERTa and derived models currently have tens of millions of downloads on HF and still serve as a reliable industry workhorse)? Or the Mamba paper[4] (pretty much the only alternative to transformers that actually gets used)? Do you want me to keep going?

Honestly, I find that whether a paper gets rejected or not means diddly squat considering how broken the review system is, and through how much honestly terrible papers I have to wade through every time I'm looking through the conference submissions for anything good.

[1] -- https://openreview.net/forum?id=idpCdOWtqXd60

[2] -- https://scholar.google.com/scholar?cites=7447715766504981253

[3] -- https://openreview.net/forum?id=SyxS0T4tvS

[4] -- https://openreview.net/forum?id=AL1fq05o7H

moralestapia•4mo ago
Based.

This guys knows his stuff.

p1esk•4mo ago
Even that paper itself does not provide any "clear" conclusions about which method is better.
lelanthran•4mo ago
> I'm surprised they didn't cite this; it's a well known paper.

I'm surprised you copied and pasted all of that without explaining what it means.

Does LoRA perform worse, better or statistically insignificantly different to FullFT?

You aren't able to tell from what you pasted, are you?

crimsoneer•4mo ago
If you're going to be snarky, could you at least clarify what the answer is for those of us who don't stay on top of ML research...?
p1esk•4mo ago
The paper does not make any clear conclusions about LoRA vs FullFT performance, beyond "the two methods seem to be learning different things".
lelanthran•4mo ago
> If you're going to be snarky, could you at least clarify what the answer is for those of us who don't stay on top of ML research...?

The answer is "There's a difference, perhaps", but the GP appeared to imply that LoRA performed worse.

My understanding is that that paper found differences, but did not conclude that the differences were quantifiably better or worse, but this is not what GP's post implied.

cheald•4mo ago
Standard LoRA (W_delta = B@A with standard inits) generally underperforms FT, primarily because of "intruder dimensions" (new high-ranking singular vectors which misalign with the singular vectors of the underlying weights) as outlined in the paper.

There are techniques like PiCa and SVFT which can mitigate much of the loss, though.

tangjurine•4mo ago
pica came out two days ago, how did you find out about it?
cheald•4mo ago
The one I was referring to was from this paper, first published in May: https://arxiv.org/abs/2505.20211v1

I don't recall how I found out about it, but it was either paperswithcode or an LLM research session working through the intruder dimensions problem.

In my Stable Diffusion tests, it substantially improves LoRA training speed and fidelity, though I've got some experiments that seem to even further substantially improve on it by adding learnable rotations of the singular vectors.

richardvsu•4mo ago
Why would they cite a paper that’s not helping with their Tinker API that was released soon after? :)
rco8786•4mo ago
I've been curious about LoRA and find a lot of these articles interesting. But I've been unable to find a good "LoRA for idiots" kind of starting point that gets me started actually doing some training with my data. Anybody know of a more practical guide I could use for that?
CaptainOfCoit•4mo ago
Unsloths documentation probably gets as close to practical as it can get: https://docs.unsloth.ai/get-started/fine-tuning-llms-guide

Be sure to validate everything you're reading though as of late I've come across more and more things that don't seem 100% accurate in their docs, seems to heavily depend on what section.

ijk•4mo ago
My sense is they need to go back and update previous docs; they release a lot of software updates and a lot of notebooks showing how to use the features, but the two might fall out of sync. Would that match your observations?
sgt101•4mo ago
Question for dudes building modern nn's... what's the thinking on estimating structural capacity for real world problem? How should I estimate how many parameters to choose for the model?
p1esk•4mo ago
You test different models on your real world problem, and pick the smallest one that works.
sgt101•4mo ago
I just think that there has to be some heuristic..
BoorishBears•4mo ago
Closest thing to a heuristic is trying the task with non fine-tuned models and building an intuition for how far off each model is, what directions it's off in, and how easily you can improve that direction via fine-tuning.

For example, for classification, if is hallucinating semantically similar, but not technically valid classes, you can probably fine-tune your way out of the gap with a smaller model.

But if your task requires world knowledge, you likely need a larger model. It's not cheap, efficient, or generally useful to fine-tune for additional world knowledge directly.

_spduchamp•4mo ago
Well since we all thought this was about Meshtastic stuff, let's just give in and make this that radio/Meshtastic comment thread.

Stumbled on this today... https://hackerpager.net/

I really want something like this with flip out keyboard and could do Signal on LTE/WiFi.

lewtun•4mo ago
For those interested in playing with an implementation of these ideas, my colleagues at HF made some recipes here: https://github.com/huggingface/trl/blob/main/docs/source/lor...