frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
1•ambitious_potat•1m ago•0 comments

Scams, Fraud, and Fake Apps: How to Protect Your Money in a Mobile-First Economy

https://blog.afrowallet.co/en_GB/tiers-app/scams-fraud-and-fake-apps-in-africa
1•jonatask•1m ago•0 comments

Porting Doom to My WebAssembly VM

https://irreducible.io/blog/porting-doom-to-wasm/
1•irreducible•1m ago•0 comments

Cognitive Style and Visual Attention in Multimodal Museum Exhibitions

https://www.mdpi.com/2075-5309/15/16/2968
1•rbanffy•3m ago•0 comments

Full-Blown Cross-Assembler in a Bash Script

https://hackaday.com/2026/02/06/full-blown-cross-assembler-in-a-bash-script/
1•grajmanu•8m ago•0 comments

Logic Puzzles: Why the Liar Is the Helpful One

https://blog.szczepan.org/blog/knights-and-knaves/
1•wasabi991011•19m ago•0 comments

Optical Combs Help Radio Telescopes Work Together

https://hackaday.com/2026/02/03/optical-combs-help-radio-telescopes-work-together/
2•toomuchtodo•24m ago•1 comments

Show HN: Myanon – fast, deterministic MySQL dump anonymizer

https://github.com/ppomes/myanon
1•pierrepomes•30m ago•0 comments

The Tao of Programming

http://www.canonical.org/~kragen/tao-of-programming.html
1•alexjplant•32m ago•0 comments

Forcing Rust: How Big Tech Lobbied the Government into a Language Mandate

https://medium.com/@ognian.milanov/forcing-rust-how-big-tech-lobbied-the-government-into-a-langua...
1•akagusu•32m ago•0 comments

PanelBench: We evaluated Cursor's Visual Editor on 89 test cases. 43 fail

https://www.tryinspector.com/blog/code-first-design-tools
2•quentinrl•34m ago•2 comments

Can You Draw Every Flag in PowerPoint? (Part 2) [video]

https://www.youtube.com/watch?v=BztF7MODsKI
1•fgclue•39m ago•0 comments

Show HN: MCP-baepsae – MCP server for iOS Simulator automation

https://github.com/oozoofrog/mcp-baepsae
1•oozoofrog•43m ago•0 comments

Make Trust Irrelevant: A Gamer's Take on Agentic AI Safety

https://github.com/Deso-PK/make-trust-irrelevant
3•DesoPK•47m ago•0 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
1•rs545837•48m ago•1 comments

Hello world does not compile

https://github.com/anthropics/claudes-c-compiler/issues/1
33•mfiguiere•54m ago•17 comments

Show HN: ZigZag – A Bubble Tea-Inspired TUI Framework for Zig

https://github.com/meszmate/zigzag
3•meszmate•56m ago•0 comments

Metaphor+Metonymy: "To love that well which thou must leave ere long"(Sonnet73)

https://www.huckgutman.com/blog-1/shakespeare-sonnet-73
1•gsf_emergency_6•58m ago•0 comments

Show HN: Django N+1 Queries Checker

https://github.com/richardhapb/django-check
1•richardhapb•1h ago•1 comments

Emacs-tramp-RPC: High-performance TRAMP back end using JSON-RPC instead of shell

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•todsacerdoti•1h ago•0 comments

Protocol Validation with Affine MPST in Rust

https://hibanaworks.dev
1•o8vm•1h ago•1 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
4•gmays•1h ago•0 comments

Show HN: Zest – A hands-on simulator for Staff+ system design scenarios

https://staff-engineering-simulator-880284904082.us-west1.run.app/
1•chanip0114•1h ago•1 comments

Show HN: DeSync – Decentralized Economic Realm with Blockchain-Based Governance

https://github.com/MelzLabs/DeSync
1•0xUnavailable•1h ago•0 comments

Automatic Programming Returns

https://cyber-omelette.com/posts/the-abstraction-rises.html
1•benrules2•1h ago•1 comments

Why Are There Still So Many Jobs? The History and Future of Workplace Automation [pdf]

https://economics.mit.edu/sites/default/files/inline-files/Why%20Are%20there%20Still%20So%20Many%...
2•oidar•1h ago•0 comments

The Search Engine Map

https://www.searchenginemap.com
1•cratermoon•1h ago•0 comments

Show HN: Souls.directory – SOUL.md templates for AI agent personalities

https://souls.directory
1•thedaviddias•1h ago•0 comments

Real-Time ETL for Enterprise-Grade Data Integration

https://tabsdata.com
1•teleforce•1h ago•0 comments

Economics Puzzle Leads to a New Understanding of a Fundamental Law of Physics

https://www.caltech.edu/about/news/economics-puzzle-leads-to-a-new-understanding-of-a-fundamental...
3•geox•1h ago•1 comments
Open in hackernews

VibeThinker-1.5B

https://github.com/WeiboAI/VibeThinker
62•tamnd•2mo ago

Comments

Alifatisk•2mo ago
Does benchmarks look incredible. Like almost too good to be true, what am I missing?

Is this hosted online somewhere so I can try it out?

viraptor•2mo ago
It's so tiny you can download and run it locally on CPU with llama.cpp. It seems weirdly good at some simple python questions. Definitely better than I'd expect from any model of that size.
Balinares•2mo ago
I don't know how the coding benchmarks are computed but this model, on its own and outside of agentic loops, definitely doesn't compare to e.g. Qwen3 Coder. I might still try that for fun, just to see how it performs given a feedback loop.

On math questions, though, beside a marked tendency towards rambling thinking, it's just plain implausibly good for a 1.5B model. This is probably just rote learning, though. Otherwise this might well be a breakthrough.

reaslonik•2mo ago
While impressive that the output isn't completely undecipherable, my real-world queries for SpringBoot project with most popular libraries don't compare so favorably to their benchmarks against Qwen3 32B, which I also run regularly (a 4bit quantized version of). Explaining tasks break completely and often.

Used their recommended temperature, top_k, top_p and so on settings

viraptor•2mo ago
Breaks as in think block contains nonsense or the output finishes? I've had some thinking weirdness which doesn't seem to affect the final answer much.

Overall it still seems extremely good for its size and I wouldn't expect anything below 30B to behave like that. I mean, it flies with 100 tok/sec even on a 1650 :D

reaslonik•2mo ago
Breaks as in contains words that grammatically work but don't make sense, mistakes the symbol | for a person, points back to things that didn't exist in the request etc. I use templates like these for explaining questions:

from

```

excerpt of text or code from some application or site

```

What is the meaning of excerpt?

Just doesen't seem to work at a useable level. Coding questions get code that runs, but almost always misses so many things that finding out what it missed and fixing those takes a lot more time than handwriting code.

>Overall it still seems extremely good for its size and I wouldn't expect anything below 30B to behave like that. I mean, it flies with 100 tok/sec even on a 1650 :D

For it's size absolutely, I've not seen 1,5B models that form even sentences right most of the time so this is miles ahead of most small models, not just to the hinted at levels the benchmarks would you have believe

viraptor•2mo ago
Interesting, I haven't seen it actually return nonsense yet. (Some incorrect things and getting into thinking loops, but always coherent) I'm running it on a latest llama.cpp with the bf16 gguf. What are you using?
reaslonik•2mo ago
I'm running the huggingface's .safetensors with vLLM with as little starting parameters as possible. I thought it must not be sending temp right, but after setting temp to something else I got chinese so it should be sending it.

Overall if you're memory constrained it's probably still worth to try and fiddle around with it if you can get it to work. Speedwise if you got the memory a 5090 can get ~50-100tok/s for a single query with 32B-AWQ and way more if you have something parallel like open-webui

DeathArrow•2mo ago
Many interesting open weights models are coming from China.
Lapel2742•2mo ago
I'm pretty sure that this is some kind of scientific achievement that I do not fully understand but the real world use cases for this model seem to be very limited.

I gave it two tasks. "Create a new and original story in 500 words" and "Write a Python console game". Both of those resulted in an endless loop with the model repeating itself

I'm honest. Given that a 1B Granite nano model has only little problems (word count) with such tasks and given that VibeThinker is announced as a programming model it's disappointing to see a 1.6B model fail multiple times.

impossiblefork•2mo ago
It's specifically trained on maths. I don't think they care at all about general instruction following or stories.
Lapel2742•2mo ago
>> [...] achieving state-of-the-art performance in mathematical and *coding tasks* [...]

And it fails at one of the simplest coding tasks where a Granite model at nearly half the size has no problems.

It's probably an important discovery but seemingly only usable in an academic context.

impossiblefork•2mo ago
The way I read the paper is that they've only tuned it on that maths dataset, so it's not made to have any coding ability.
impossiblefork•2mo ago
I don't quite understand the MRPO.

So during the final they try to ensure the model doesn't get the right answer every time, but only 50% of time, so as to avoid killing all variability-- very sensible, and then they compute a measure of this, take the negative exponential of this measure and then they scale the advantage by this.

So a question matters in proportion to the variability of the answers. Isn't this more curriculum learning stuff than actually suppressing things that don't vary enough?

Basically focusing on questions that are still hard instead of trying to push the probability of problem it's often able to solve to 99.99%?

Also very reasonable, but this isn't how they describe it. Instead, from their description I would think they're sort of forcing entropy to be high somehow.

I think the way I'd have titled it would be something like "Dynamic curricula to preserve model entropy".