frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
250•theblazehen•2d ago•84 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
23•AlexeyBrin•1h ago•1 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
705•klaussilveira•15h ago•206 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
967•xnx•21h ago•558 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
67•jesperordrup•6h ago•28 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
7•onurkanbkrc•44m ago•0 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
135•matheusalmeida•2d ago•35 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
43•speckx•4d ago•34 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
68•videotopia•4d ago•6 comments

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
13•matt_d•3d ago•2 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
39•kaonwarb•3d ago•30 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
45•helloplanets•4d ago•46 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
237•isitcontent•16h ago•26 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
237•dmpetrov•16h ago•126 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
340•vecti•18h ago•147 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
506•todsacerdoti•23h ago•247 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
389•ostacke•21h ago•97 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
303•eljojo•18h ago•188 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
361•aktau•22h ago•186 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
428•lstoll•22h ago•284 comments

Cross-Region MSK Replication: K2K vs. MirrorMaker2

https://medium.com/lensesio/cross-region-msk-replication-a-comprehensive-performance-comparison-o...
3•andmarios•4d ago•1 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
71•kmm•5d ago•10 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
23•bikenaga•3d ago•11 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
25•1vuio0pswjnm7•2h ago•14 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
96•quibono•4d ago•22 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
270•i5heu•18h ago•219 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
34•romes•4d ago•3 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1079•cdrnsf•1d ago•461 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
64•gfortaine•13h ago•30 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
306•surprisetalk•3d ago•44 comments
Open in hackernews

Vortex: An extensible, state of the art columnar file format

https://github.com/vortex-data/vortex
115•tanelpoder•2mo ago

Comments

sys13•2mo ago
How does this compare with delta lake and iceberg?
oa335•2mo ago
Vortex is a file format, where as delta lake and iceberg are table formats. it should be compared to Parquet rather than delta lake and iceberg. This guest lecture by a maintainer of Vortex provides a good overview of the file format, motivations for its creation and its key features.

https://www.youtube.com/watch?v=zyn_T5uragA

sys13•2mo ago
I think it would still make sense to compare with those table formats, or is the idea that you would only use this if you could not use a table format?
bz_bz_bz•2mo ago
That’s like comparing words with characters.

Vortex is, roughly, how you save data to files and Iceberg is the database-like manager of those files. You’ll soon be able to run Iceberg using Vortex because they are complementary, not competing, technologies.

ks2048•2mo ago
The website could use a comparison / motivation in comparison to Parquet (beyond just stating it's 100x better).
3eb7988a1663•2mo ago
Agreed, really need a tl;dr here, because Parquet is boring technology. Going to require quite the sales pitch to move. At minimum, I assume it will be years before I could expect native integration in pandas/polars/etc which would make it low effort enough to consider.

Parquet is ..fine, I guess. It is good enough. Why invoke churn? Sell me on the vision.

frisbm•2mo ago
DuckDB just added support for vortex in their last release using the Vortex Python package so hopefully other tools wont be too far behind
bsder•2mo ago
> Going to require quite the sales pitch to move.

Mutability would be one such pitch I would like to see ...

cpard•2mo ago
As others said, Vortex is complementary to the table Formats you mentioned.

There are other formats though that it can be compared to.

The Lance columnar format is one: https://github.com/lancedb/lancedb

And Nimble from Meta is another: https://github.com/facebookincubator/nimble

Parquet is so core to data infra and widespread, that removing it from its throne is a really really hard task.

The people behind these projects that are willing to try and do this, have my total respect.

nahnahno•2mo ago
how does this compare to Arrow IPC / Feather v2?
rubenvanwyk•2mo ago
I've never understood why people say Feather file format isn't meant for "long-term" storage and prefer Parquet for that. Access is much faster from Feather, compression better with Parquet but Feather is really good.
sheepscreek•2mo ago
Honestly I think Arrow makes Feather redundant. To answer your question, Parquet is optimized for storage on disk - can store with compression to take leas space, and might include clever tricks or some form of indices to query data from the file. Feather on the other hand is optimized for loading onto memory. It uses the same representation on disk as it does in memory. Very little in the way of compression (if any). No optimized for disk at all. BUT you can memory map a Feather file and randomly access any part of it in O(1) time (I believe, but do your own due diligence :)
ozgrakkurt•2mo ago
It is wildly more complex
kipukun•2mo ago
The cuDF interop in the roadmap [1] will be huge for my workloads. XGBoost has the fastest inference time on GPUs, so a fast path straight from these Vortex files to GPU memory seems promising.

[1] https://github.com/vortex-data/vortex/issues/2116

reactordev•2mo ago
Can you explain how it’s faster? GPU memory is just a blob with an address. Is it because the loading algorithms for vortex align better with XGBoost or just plain uploading to the GPU?
robert3005•2mo ago
What you can do if you have gpu friendly format is you send compressed data over PCI-E and then decompress on the gpu. Thus your overall throughput will increase since PCI-E bandwidth is the limiting factor of the overall system.
reactordev•2mo ago
That doesn’t explain how vortex is faster. Yes, you should send compressed data to the GPU and let it uncompress. You should maximize your PCI-E throughput to minimize latency in execution, but what does Vortex bring? Other than Parque bad, Vortex good.
kipukun•2mo ago
XGBoost is just faster on the GPU, regardless of the file format. A sibling post also pointed out compression helping out on bandwidth.
xigoi•2mo ago
Can we stop with the cringe emojis at the start of every heading?
mrbluecoat•2mo ago
I guess not surprising from a project that combines Polars & Vortex
kh_hk•2mo ago
I tend to agree, but I don't see this one as any of the worst offenders, unless I am missing something.

This readme has what, max two or three emojis? Compare that to most LLM generated readmes with a zillion of emojis for every single feature.

xigoi•2mo ago
They seem to have removed the emojis since I posted my comment: https://github.com/vortex-data/vortex/commit/8294dd665869a72...
kh_hk•2mo ago
Thanks
rubenvanwyk•2mo ago
Vortex and Lance both seem really cool but will have to infiltrate either the Delta or Iceberg specs to become mainstream.
robert3005•2mo ago
Can’t wait for https://github.com/apache/iceberg/issues/12225 to merge so there’s an api to integrate against
meehai•2mo ago
Can you append new columns to a file stored on disk without reading it all in mempey? Somehoe this is beyond parquet capabilities.
robert3005•2mo ago
The default writer will decompress the values, however, right now you can implement your own write strategy that will avoid doing it. We plan on adding that as an option since it’s quite common.
andyferris•2mo ago
One thing I found interesting is the logical type system doesn't seem to include sum types or unions, unlike Arrow etc.

I'd generally encourage new type systems to include sum types as a first-class concept.

infogulch•2mo ago
I wonder if a columnar storage format should implement sum types with a struct of arrays where only one array has a nun-null value for each index.
ozgrakkurt•2mo ago
Arrow has two variants of it and this is one of them. Other variant has a seperate offsets array that you use to index into the active “field” array, so it is slower to process in most cases but is more compact