frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
511•klaussilveira•8h ago•142 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
850•xnx•14h ago•509 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
61•matheusalmeida•1d ago•12 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
168•isitcontent•9h ago•20 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
172•dmpetrov•9h ago•77 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
285•vecti•11h ago•128 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
65•quibono•4d ago•11 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
340•aktau•15h ago•166 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
230•eljojo•11h ago•142 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
334•ostacke•15h ago•90 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
425•todsacerdoti•16h ago•222 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
365•lstoll•15h ago•253 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
4•videotopia•3d ago•0 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
36•kmm•4d ago•3 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
11•romes•4d ago•1 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
85•SerCe•5h ago•67 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
12•denuoweb•1d ago•1 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
17•gmays•4h ago•2 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
215•i5heu•11h ago•160 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
36•gfortaine•6h ago•9 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
59•phreda4•8h ago•11 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
124•vmatsiiako•14h ago•51 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
160•limoce•3d ago•80 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
259•surprisetalk•3d ago•34 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1023•cdrnsf•18h ago•425 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
53•rescrv•16h ago•17 comments

WebView performance significantly slower than PWA

https://issues.chromium.org/issues/40817676
15•denysonique•5h ago•2 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
102•ray__•5h ago•49 comments

Evaluating and mitigating the growing risk of LLM-discovered 0-days

https://red.anthropic.com/2026/zero-days/
44•lebovic•1d ago•13 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
81•antves•1d ago•59 comments
Open in hackernews

Potential and Limitation of High-Frequency Cores and Caches (2024)

https://arch.cs.ucdavis.edu/simulation/2024/08/06/potentiallimitationhighfreqcorescaches.html
30•matt_d•8mo ago

Comments

bob1029•8mo ago
> We also did not model the SERDES (serializer-deserializer) circuits that would be required to interface the superconductor components with the room-temperature components, which would have an impact on the performance of the workloads. Instead, we assumed that the interconnect is unchanged from CMOS.

I had a little chuckle when I got to this. I/O is the hard part. Getting the information from A to B.

IBM is probably pushing the practical limits with 5.5GHz base clock on every core. When you can chew through 10+ gigabytes of data per second per core, it becomes a lot less about what the CPU can do and more about what everything around it can do.

The software is usually the weak link in all of this. Disrespect the NUMA and nothing will matter. The layers of abstraction can make it really easy to screw this up.

PaulHoule•8mo ago
In a phase when I was doing a lot of networking I hooked up with a chip designer who familiarized me with the "memory wall", ASIC and FPGA aren't quite the panacea they seem to be because if you have a large working set you are limited by memory bandwidth and latency.

Note faster-than-silicon electronics have been around for a while, the DOD put out an SBIR for a microprocessor based on Indium Phosphide in the 1990s which I suspect is a real product today but secret. [1] Looking at what fabs provide it seems one could make something a bit better than a 6502 that clocks out at 60 GHz and maybe you can couple it to 64kb of static RAM, maybe more with 2.5-d packaging. You might imagine something like that would be good for electronic warfare and for the simplest algorithms and waveforms it could buy a few ns of reduced latency but for more complex algorithms modern chips get a lot of parallelism and are hard to beat on throughput.

[1] Tried talking with people who might know, nobody wanted to talk.

foota•8mo ago
I've rea confidential proposals for chips with very high available memory bandwidth, but otherwise reduced performance compared to a standard general purpose CPU.

Something somewhere between a CPU and a GPU, that could handle many parallel streams, but at lower throughput than a CPU, and with very high memory bandwidth for tasks that need to be done against main memory. The niche here is for things like serialization and compression that need lots of bandwidth, can't be done efficiently on the GPU (not parallel), and waste precious time on the CPU.

PaulHoule•8mo ago
Like

https://en.wikipedia.org/wiki/UltraSPARC_T1

?

foota•8mo ago
Similar in concept, I think the idea is that it would be used as an application coprocessor though, as opposed to the main processor, and obviously a lot more threads.

I don't remember all the details, but picture a bunch of those attached to different parts of the processor hierarchy remotely, e.g., one per core or one per NUMA node etc.,. The connection between the coprocessor and the processor can be thin, because the processor would just be sending commands to the coprocessor, so they wouldn't consume much of the constrained processor bandwidth, and each coprocessor would have a high bandwidth connection to memory.

saltcured•8mo ago
There was also the Tera MTA and various "processor-in-memory" research projects in academia.

Eventually, it's all full circle to supercomputer versus "hadoop cluster" again. Can you farm out work locally near bits of data or does your algorithm effectively need global scope to "transpose" data and hit bisection bandwidth limits of your interconnect topology.

Veserv•8mo ago
I am not sure that is the case anymore. High Bandwidth Memory (HBM) [1] as used on modern ML training GPUs has immensely more memory bandwidth than traditional CPU systems.

DDR5 [2] tops out around 60-80 GB/s. HBM3, used on the H100 GPUs, tops out at 819 GB/s. 10-15x more bandwidth. At a 4 GHz clock, you need to crunch 200 bytes/clock to become memory bandwidth limited.

[1] https://en.wikipedia.org/wiki/High_Bandwidth_Memory

[2] https://en.wikipedia.org/wiki/DDR5_SDRAM

ryao•8mo ago
The memory wall (also known as the Von Neumann bottleneck) is still true. Token generation on Nvidia GPUs is memory bound, unless you do very large batch sizes to become compute bound.

That said, more exotic architectures from cerebras and groq get far less token per second performance than their memory bandwidth suggests they can, so they have a bottleneck elsewhere.

Veserv•8mo ago
You get a memory bound on GPUs because they have so much more compute per memory. The H100 has 144 SMs driving 4x32 threads per clock. That is 18,432 threads demanding memory.

Now to be fair, that is separated into 8 clusters which I assume are connected to their own memory so you actually only have 576 threads sharing memory bandwidth. But that is still way more compute than any single processing element could ever hope to have. You can drown any individual processor in memory bandwidth these days unless you somehow produce a processor clocked at multiple THz.

The problem does not seem to be memory bandwidth, but cost, latency, and finding the cost-efficient compute-bandwidth tradeoff for a given task.

ryao•7mo ago
You can predict the token generation performance of a GPU or CPU by dividing the memory bandwidth by the size of the active parameters. By definition, that is a memory bandwidth bottleneck. I have no idea why you think it is not.

Anyone who has worked on inference code knows that memory bandwidth is the principal bottleneck for token generation. For example:

https://github.com/ryao/llama3.c

PaulHoule•8mo ago
Certainly an ASIC or FPGA on a package with HBM could do more.

So far as exotic 10x clocked systems based on 3-5 semiconductors, squids, or something, I think memory does have to be packaged with the rest of it. Ecauss of speed of light issues.

markhahn•8mo ago
they're both DRAM, so have roughly the same performance per interface-bit-width and clock. you can see this very naturally by looking at higher-end CPUs, which have wider DDR interfaces (currently up to 12x64b per socket - not as wide as in-package HBM, but duh)