frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

TOSTracker – The AI Training Asymmetry

https://tostracker.app/analysis/ai-training
1•tldrthelaw•2m ago•0 comments

The Devil Inside GitHub

https://blog.melashri.net/micro/github-devil/
1•elashri•3m ago•0 comments

Show HN: Distill – Migrate LLM agents from expensive to cheap models

https://github.com/ricardomoratomateos/distill
1•ricardomorato•3m ago•0 comments

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

https://github.com/sigmastratum/documentation/tree/main/sigma-runtime/SR-053
1•teugent•3m ago•0 comments

Make a local open-source AI chatbot with access to Fedora documentation

https://fedoramagazine.org/how-to-make-a-local-open-source-ai-chatbot-who-has-access-to-fedora-do...
1•jadedtuna•5m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

https://github.com/ghostty-org/ghostty/pull/10559
1•samtrack2019•5m ago•0 comments

Software Factories and the Agentic Moment

https://factory.strongdm.ai/
1•mellosouls•5m ago•1 comments

The Neuroscience Behind Nutrition for Developers and Founders

https://comuniq.xyz/post?t=797
1•01-_-•5m ago•0 comments

Bang bang he murdered math {the musical } (2024)

https://taylor.town/bang-bang
1•surprisetalk•5m ago•0 comments

A Night Without the Nerds – Claude Opus 4.6, Field-Tested

https://konfuzio.com/en/a-night-without-the-nerds-claude-opus-4-6-in-the-field-test/
1•konfuzio•8m ago•0 comments

Could ionospheric disturbances influence earthquakes?

https://www.kyoto-u.ac.jp/en/research-news/2026-02-06-0
1•geox•9m ago•0 comments

SpaceX's next astronaut launch for NASA is officially on for Feb. 11 as FAA clea

https://www.space.com/space-exploration/launches-spacecraft/spacexs-next-astronaut-launch-for-nas...
1•bookmtn•11m ago•0 comments

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
1•fainir•13m ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•14m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•16m ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
2•Brajeshwar•20m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
3•Brajeshwar•20m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
2•Brajeshwar•20m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•24m ago•1 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•27m ago•1 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•28m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•28m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
3•vinhnx•29m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•33m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•38m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•42m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•44m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•45m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
5•okaywriting•51m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•54m ago•0 comments
Open in hackernews

Potential and Limitation of High-Frequency Cores and Caches (2024)

https://arch.cs.ucdavis.edu/simulation/2024/08/06/potentiallimitationhighfreqcorescaches.html
30•matt_d•8mo ago

Comments

bob1029•8mo ago
> We also did not model the SERDES (serializer-deserializer) circuits that would be required to interface the superconductor components with the room-temperature components, which would have an impact on the performance of the workloads. Instead, we assumed that the interconnect is unchanged from CMOS.

I had a little chuckle when I got to this. I/O is the hard part. Getting the information from A to B.

IBM is probably pushing the practical limits with 5.5GHz base clock on every core. When you can chew through 10+ gigabytes of data per second per core, it becomes a lot less about what the CPU can do and more about what everything around it can do.

The software is usually the weak link in all of this. Disrespect the NUMA and nothing will matter. The layers of abstraction can make it really easy to screw this up.

PaulHoule•8mo ago
In a phase when I was doing a lot of networking I hooked up with a chip designer who familiarized me with the "memory wall", ASIC and FPGA aren't quite the panacea they seem to be because if you have a large working set you are limited by memory bandwidth and latency.

Note faster-than-silicon electronics have been around for a while, the DOD put out an SBIR for a microprocessor based on Indium Phosphide in the 1990s which I suspect is a real product today but secret. [1] Looking at what fabs provide it seems one could make something a bit better than a 6502 that clocks out at 60 GHz and maybe you can couple it to 64kb of static RAM, maybe more with 2.5-d packaging. You might imagine something like that would be good for electronic warfare and for the simplest algorithms and waveforms it could buy a few ns of reduced latency but for more complex algorithms modern chips get a lot of parallelism and are hard to beat on throughput.

[1] Tried talking with people who might know, nobody wanted to talk.

foota•8mo ago
I've rea confidential proposals for chips with very high available memory bandwidth, but otherwise reduced performance compared to a standard general purpose CPU.

Something somewhere between a CPU and a GPU, that could handle many parallel streams, but at lower throughput than a CPU, and with very high memory bandwidth for tasks that need to be done against main memory. The niche here is for things like serialization and compression that need lots of bandwidth, can't be done efficiently on the GPU (not parallel), and waste precious time on the CPU.

PaulHoule•8mo ago
Like

https://en.wikipedia.org/wiki/UltraSPARC_T1

?

foota•8mo ago
Similar in concept, I think the idea is that it would be used as an application coprocessor though, as opposed to the main processor, and obviously a lot more threads.

I don't remember all the details, but picture a bunch of those attached to different parts of the processor hierarchy remotely, e.g., one per core or one per NUMA node etc.,. The connection between the coprocessor and the processor can be thin, because the processor would just be sending commands to the coprocessor, so they wouldn't consume much of the constrained processor bandwidth, and each coprocessor would have a high bandwidth connection to memory.

saltcured•8mo ago
There was also the Tera MTA and various "processor-in-memory" research projects in academia.

Eventually, it's all full circle to supercomputer versus "hadoop cluster" again. Can you farm out work locally near bits of data or does your algorithm effectively need global scope to "transpose" data and hit bisection bandwidth limits of your interconnect topology.

Veserv•8mo ago
I am not sure that is the case anymore. High Bandwidth Memory (HBM) [1] as used on modern ML training GPUs has immensely more memory bandwidth than traditional CPU systems.

DDR5 [2] tops out around 60-80 GB/s. HBM3, used on the H100 GPUs, tops out at 819 GB/s. 10-15x more bandwidth. At a 4 GHz clock, you need to crunch 200 bytes/clock to become memory bandwidth limited.

[1] https://en.wikipedia.org/wiki/High_Bandwidth_Memory

[2] https://en.wikipedia.org/wiki/DDR5_SDRAM

ryao•8mo ago
The memory wall (also known as the Von Neumann bottleneck) is still true. Token generation on Nvidia GPUs is memory bound, unless you do very large batch sizes to become compute bound.

That said, more exotic architectures from cerebras and groq get far less token per second performance than their memory bandwidth suggests they can, so they have a bottleneck elsewhere.

Veserv•8mo ago
You get a memory bound on GPUs because they have so much more compute per memory. The H100 has 144 SMs driving 4x32 threads per clock. That is 18,432 threads demanding memory.

Now to be fair, that is separated into 8 clusters which I assume are connected to their own memory so you actually only have 576 threads sharing memory bandwidth. But that is still way more compute than any single processing element could ever hope to have. You can drown any individual processor in memory bandwidth these days unless you somehow produce a processor clocked at multiple THz.

The problem does not seem to be memory bandwidth, but cost, latency, and finding the cost-efficient compute-bandwidth tradeoff for a given task.

ryao•7mo ago
You can predict the token generation performance of a GPU or CPU by dividing the memory bandwidth by the size of the active parameters. By definition, that is a memory bandwidth bottleneck. I have no idea why you think it is not.

Anyone who has worked on inference code knows that memory bandwidth is the principal bottleneck for token generation. For example:

https://github.com/ryao/llama3.c

PaulHoule•8mo ago
Certainly an ASIC or FPGA on a package with HBM could do more.

So far as exotic 10x clocked systems based on 3-5 semiconductors, squids, or something, I think memory does have to be packaged with the rest of it. Ecauss of speed of light issues.

markhahn•8mo ago
they're both DRAM, so have roughly the same performance per interface-bit-width and clock. you can see this very naturally by looking at higher-end CPUs, which have wider DDR interfaces (currently up to 12x64b per socket - not as wide as in-package HBM, but duh)