frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
193•theblazehen•2d ago•56 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
678•klaussilveira•14h ago•203 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
954•xnx•20h ago•552 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
125•matheusalmeida•2d ago•33 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
25•kaonwarb•3d ago•21 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
62•videotopia•4d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
235•isitcontent•15h ago•25 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
227•dmpetrov•15h ago•121 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
38•jesperordrup•5h ago•17 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
332•vecti•17h ago•145 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
499•todsacerdoti•22h ago•243 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
384•ostacke•21h ago•96 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
360•aktau•21h ago•183 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
21•speckx•3d ago•10 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
291•eljojo•17h ago•182 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
413•lstoll•21h ago•279 comments

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
6•matt_d•3d ago•1 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
20•bikenaga•3d ago•10 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
66•kmm•5d ago•9 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
93•quibono•4d ago•22 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
260•i5heu•17h ago•202 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
33•romes•4d ago•3 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
38•gmays•10h ago•12 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1073•cdrnsf•1d ago•458 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
60•gfortaine•12h ago•26 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
291•surprisetalk•3d ago•43 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
150•vmatsiiako•19h ago•71 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
8•1vuio0pswjnm7•1h ago•0 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
154•SerCe•10h ago•144 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
187•limoce•3d ago•102 comments
Open in hackernews

High-performance 2D graphics rendering on the CPU using sparse strips [pdf]

https://github.com/LaurenzV/master-thesis/blob/main/main.pdf
281•PaulHoule•2mo ago

Comments

miguel_martin•2mo ago
Also checkout blaze: https://gasiulis.name/parallel-rasterization-on-cpu/
hollowturtle•2mo ago
the demo is astonishing
raphlinus•2mo ago
Thanks for the pointer, we were not actually aware of this, and the claimed benchmark numbers look really impressive.
convolvatron•2mo ago
there were at least two renderers written for the CM2 that used strips. at least one of them used scans and general communication, most likely both.

1) for the given processor set, where each process holds an object 'spawn' a processor in a new set, one processor for each span. (a) spawn operation consists of the source processor setting the number of nodes in the new domain, then performing an add-scan, then sending the total allocation back to the front end the front end then allocates a new power-of-2 shape than can hold those the object-set then uses general communication to send scan information to the first of these in the strip-set (the address is left over from the scan) (b) in the strip-set, use a mask-copy-scan to get all the parameters to all the the elements of the scan set. (c) each of these elements of the strip set determine the pixel location of the leftmost element (d) use a general send to seed the strip with the parameters of the strip (e) scan those using a mask-copy-scan in the pixel-set (f) apply the shader or the interpolation in the pixel-set

note that steps (d) and (e) also depend on encoding the depth information in the high bits and using a max combiner to perform z-buffering.

Edit: there must have been an additional span/scan in a pixel space that is then sent to image space with z buffering, otherwise strip seeds could collide, and be sorted by z which may miss pixels from the losing strip

actionfromafar•2mo ago
What's a CM2? I tried searching combined with some graphics related keywords but I just go weird stuff.
Lerc•2mo ago
Given the focus on parallelism and communication, maybe the Connection Machine 2?
pixelpoet•2mo ago
This looks interesting; recently I wrote some code for rendering high precision N-body paths with millions of vertices[0], I wonder if a GPU implementation this RLE representation would work well and maintain simplicity.

[0] https://www.youtube.com/watch?v=rmyA9AE3hzM

amelius•2mo ago
Side question. Is there some kind of benchmark to test the correctness of renderers?
embedding-shape•2mo ago
Correctness of what exactly? It's a "render" of reality-like environment, so all of them make some tradeoff somewhere, and won't be 100% "correct" at least compared to reality :)
jmpeax•2mo ago
Correctness with respect to the benchmark. A slow reference renderer could produce the target image, and renderers need to achieve either exact or close reproduction to the reference. Otherwise, you could just make substantial approximations and claim a performance victory.
user____name•2mo ago
Bezier curves can generate degenerate geometry when flattened and stroke geometry has to handle edge cases. See for instance the illustration on the last page of the Polar Stroking paper: https://arxiv.org/pdf/2007.00308

There are also things like interpretting (conflating) coverage as alpha for analytical antialiasing methods, which lead to visible hairline cracks.

qingcharles•2mo ago
I assume parent commenter means to avoid things like rendering the same pixel twice for adjacent paths, and avoiding gaps between identical paths. These are common problems for fast renderers that take liberties with accuracy over speed. (e.g. greater numerical errors caused by fixed point over floating point)
percentcer•2mo ago
This was the original goal of the Cornell box (https://en.wikipedia.org/wiki/Cornell_box, i.e. carefully measure the radiosity of a simple, real-world scene and then see how closely you can come to simulating it).

For realtime rendering a common thing to do is to benchmark against a known-good offline renderer (e.g. Arnold, Octane)

mkl•2mo ago
That's for realistic 3D rendering, a totally different problem from 2D vector graphics.
fngjdflmdflg•2mo ago
Fascinating project. Based on section 3.9, it seems the output is in the form of a bitmap. So I assume you have to do a full memory copy to the GPU to display the image in the end. With skia moving to WebGPU[0] and with WebGPU supporting compute shaders, I feel that 2D graphics is slowly becoming a solved problem in terms of portability and performance. Of course there are cases where you would a want a CPU renderer. Interestingly the web is sort of one of them because you have to compile shaders at runtime on page load. I wonder if it could make sense in theory to have multiple stages to this, sort of like how JS JITs work, were you would start with a CPU renderer while the GPU compiles its shaders. Another benefit, as the author mentions, is binary size. WebGPU (via dawn at least) is rather large.

[0] https://blog.chromium.org/2025/07/introducing-skia-graphite-...

raphlinus•2mo ago
The output of this renderer is a bitmap, so you have to do an upload to GPU if that's what your environment is. As part of the larger work, we also have Vello Hybrid which does the geometry on CPU but the pixel painting on GPU.

We have definitely thought about having the CPU renderer while the shaders are being compiled (shader compilation is a problem) but haven't implemented it.

fngjdflmdflg•2mo ago
In any interactive environment you have to upload to the GPU on each frame to output to a display, right? Or maybe integrated SoCs can skip that? Of course you only need to upload the dirty rects, but in the worst case the full image.

>geometry on CPU but the pixel painting on GPU

Wow. Is this akin to running just the vertex shader on the CPU?

raphlinus•2mo ago
It's analogous, but vertex shaders are just triangles, and in 2D graphics you have a lot of other stuff going on.

The actual process of fine rasterization happens in quads, so there's a simple vertex shader that runs on GPU, sampling from the geometry buffers that are produced on CPU and uploaded.

jcelerier•2mo ago
I regularly do remote VNC and X11 access on stuff like raspberry pi zero and in these cases GPU does not work, you won't be able to open a GL context at all. Also whenever i upadte my kernel on archlinux i'm not able to open a gl context until i reboot, so I really need apps that don't need a gpu context just to show stuff
zamadatix•2mo ago
For the Pi Zero you can force a headless HDMI output in the config and then use that instead of a virtual display to get working GPU with VNC.
actionfromafar•2mo ago
You can also trick any HDMI output to believe it's connected to a monitor.

One commercial product is:

https://eshop.macsales.com/item/NewerTech/ADP4KHEAD/

But I seem to recall there are dirt cheap hacks to do same. I may be conflating it with "resister jammed into DVI port" which worked back in the VGA and DVI days. Memory unlocked - did this to an old Mac Mini in a closet for some reason.

qingcharles•2mo ago
Surely not if the CPU and video output device share common RAM?

Or with old VGA, the display RAM was mapped to known system RAM addresses and the CPU would write directly to it. (you could write to an off-screen buffer and flip for double/triple buffering)

ChrisGreenHeur•2mo ago
It just depends on what architecture your computer has.

On a PC, the CPU typically has exclusive access to system RAM, while the GPU has its own dedicated VRAM. The graphics driver runs code on both the CPU and the GPU since the GPU has its own embedded processor so data is constantly being copied back and forth between the two memory pools.

Mobile platforms like the iPhone or macOS laptops are different: they use unified memory, meaning the CPU and GPU share the same physical RAM. That makes it possible to allocate a Metal surface that both can access, so the CPU can modify it and the GPU can display it directly.

However, you won’t get good frame rates on a MacBook if you try to draw a full-screen, pixel-perfect surface entirely on the CPU it just can’t push pixels that fast. But you can write a software renderer where the CPU updates pixels and the GPU displays them, without copying the surface around.

nicoburns•2mo ago
One place where a CPU renderer is particularly useful is in test runners (where the output of the test is a image/screenshot). Or I guess any other use cases where the output is an image. In that case, the output never needs to get to the GPU, and indeed if you render on the GPU then you have to copy the image back!
Reason077•2mo ago
> "I assume you have to do a full memory copy to the GPU to display the image in the end."

On a unified memory architecture (eg: Apple Silicon), that's not an expensive operation. No copy required.

raphlinus•2mo ago
Unfortunately graphics APIs suck pretty hard when it comes to actually sharing memory between CPU and GPU. A copy is definitely required when using WebGPU, and also on discrete cards (which is what these APIs were originally designed for). It's possible that using native APIs directly would let us avoid copies, but we haven't done that.
voidmain•2mo ago
The paper defines this structure

    struct Strip {
        x: u16,
        y: u16,
        alpha_idx_fill_gap: u32,
    }
which looks like it is 64 bits (8 bytes) in size,

and then says

> Since a single strip has a memory footprint of 64 bytes and a single alpha value is stored as u8, the necessary storage amounts to around 259 ∗ 64 + 7296 ≈ 24KB

am I missing something, or is it actually 259*8 + 7296 ≈ 9KB?

shoo•2mo ago
i think you are correct, memory use of the implementation is overestimated in that paragraph, as you suggest it is lower. from a quick skim read, the benchmarks section focuses on comparing running time against other libraries, there isn't a comparison of storage.
Benjamin_Dobell•2mo ago
Admittedly I won't have time to go through the code. However, a quick look at the thesis, there's a section on multi-threading.

Whilst it's still very possible this was a simple mistake, an alternate explanation could be that each strip is allocated to a unique cache line. On modern x86_64 systems, a cache line is 64 bytes. If the renderer is attempting to mitigate false sharing, then it may be allocating each strip in its own cache line, instead of contiguously in memory.

Vallaaaris•2mo ago
Hi, author here, you are right it seems like I mixed up bytes and bits here. Embarassing mistake, thanks for catching this!
swiftcoder•2mo ago
Off-topic, but when did GitHub's PDF preview start to only load a few pages at a time? I'd much rather they delivered the whole PDF and let my browser handle the PDF rendering...
thisOtterBeGood•2mo ago
Interesting. What I would like to see is a single core comparison of the compared renderers, since that would indicate the efficiency of the code. I would assume the popular renderer are not as fast but also need less cpu-time overall?
Vallaaaris•2mo ago
There is a section on single-performance comparison in the thesis!

Alternatively, you can also check the results from the official Blend2D benchmarks: https://blend2d.com/performance.html

Or my version where I added some more renderers to the existing ones: https://laurenzv.github.io/vello_chart/

dxroshan•2mo ago
Is one of the advisors, Raph Levien, the author of the old Libart library?
CrimsonCape•2mo ago
Yes.