frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Spiral

https://spiraldb.com/post/announcing-spiral
139•jorangreef•1h ago

Comments

all2•1h ago
Spelling error "sttill"

> P.S. If you're sttill managing data in spreadsheets, this post isn't for you. Yet.

---

Since I discovered the ECS pattern, I've been curious about backing it with a database. One of the big issues seems to be IO on the database side. I wonder if Spiral might solve this issue.

lordnacho•1h ago
If the ECS data is grid-like, perhaps you could use a columnar database for time series?

Then you could save every single state change and scroll back and forth. But I'm not sure if you were looking for that.

harwoodr•58m ago
Have a look at something like spacetimeDB - caveat, I've only read about it and not directly used it:

https://github.com/ClockworkLabs/SpacetimeDB

SomeHacker44•1h ago
"100KiB images"... This is odd. Most of my images are 2.5-4 MB. My raw images are 3-10x larger.
turnsout•1h ago
I bet this refers to some common training use case that leverages 512px or 1024px images. Or it’s just Palantir scanning security camera frames.
pauldix•1h ago
I've been following this team's work for a while and what they're doing is super interesting. The file format they created and put into the LF, Vortex, is very welcome innovation in the space: https://github.com/vortex-data/vortex

I'm excited to start doing some experimentation with Vortex to see how it can improve our products.

Great stuff, congrats to Will and team!

dist-epoch•1h ago
https://vortex.dev doesn't work in my Firefox:

Application error: a client-side exception has occurred while loading vortex.dev (see the browser console for more information).

Console: unable to create webgl context

arusahni•1h ago
Works for me. Mozilla/5.0 (X11; Linux x86_64; rv:142.0) Gecko/20100101 Firefox/142.0
miloignis•52m ago
Presumably you don't have WebGL enabled or supported - the main page is just a cute 3D landing page.

You may be interested in https://github.com/vortex-data/vortex which of course has an overview and links to their docs and benchmark pages.

reactordev•1h ago
Anyone that can improve upon the parquet hell that is my life is gladly welcomed...
riku_iki•30m ago
why you don't like parquet?
paxys•1h ago
Wasn't "3.0" supposed to be crypto? Is it AI now? It's had to keep track.
ionwake•51m ago
I think AI is 4.0

EDIT> Maybe its how some poeple call the 4th dimension time when there is infact a 4th spatial dimension. So I guess if this is the 3rd Data dimension like what is the 4th one?

jppope•50m ago
I think some of the crypto companies tried to get cute and leapfrog 3.0 going straight to 4.0, so that would put us at either 5.0, 4.0, 3.1, 2.2, or 2.1 depending on how you feel about the crypto space, and which groups you were validating
bee_rider•44m ago
No, Web 3.0 was the Semantic Web. Thankfully, the silly idea of having major-number versions for the entire internet died when that it happen. Now we can safely ignore anybody who tries to do it.
adfm•25m ago
You’re conflating concepts. FWIW, Web3 is snake oil or wishful thinking at best. As much as people like to bang on the old Web 2.0, it still holds up conceptually. And if you only know it as a buzz word, I suggest you go back and familiarize yourself with it if you’re looking for incremental change.

Who knows, maybe a Web 3.1 will deliver us from Enshitification.

holoduke•1h ago
So basically this is a file system that runs on your gpu?
djfobbz•1h ago
So this Vortex engine is a combination of OLTP and OLAP on steroids?
maxmcd•35m ago
Do they mention transactions anywhere? Maybe it will be OLAP?
didibus•14m ago
It sounded only OLAP from the article.
cryptonector•57m ago
I can't tell what this is about.
dkdbejwi383•45m ago
Do you remember the days of “mongodb is web-scale”? It’s that but “spiral is ai-scale”
nwhnwh•24m ago
So it will be irrelevant after a few years?
zzzeek•17m ago
maybe just a few months, AI scale is much faster than web scale of course
znort_•36m ago
"I've been building data systems for long enough to be skeptical of “revolutionary” claims, and I’m uncomfortable with grandiose statements like “Built for the AI Era”. Nevertheless, ...

... i'm gonna make revolutionary claims and grandiose statements like "built for the ai era".

bee_rider•35m ago
Probably either overcoming giant robots with the power of friendship and a giant drill, or a cursed village with an obsession-inducing whirlpool.
riku_iki•31m ago
my reading that it will be some hyper-performant db thanks to some very low level optimization utilizing recent hw advancements and formats/pipelines unification and simplification.
didibus•9m ago
I think I understood it as the database will basically store data in a binary format that can be fed into the GPU directly, and will also be optimized for streaming/batching large chunks of data at ounce.

So it's "optimized for machines to consume" meaning the GPU.

Their use case was training ML models where you need to feed the GPU massive datasets as part of training.

They seem to claim that training is now bottlenecked by how quickly you can feed the GPU, that otherwise the GPU is basically "waiting on IO" most of the time and not actual computing because the time goes in just grabbing the next piece of data, transforming it for GPU consumption, and then feeding it into the GPU.

But I'm not an expert, this is just my take from the article.

4ndrewl•56m ago
The three eras of database systems starts with a client-server Postgres, but missed the daddy of the generation before that - xBase (ie dBase, FoxPro etc).
khaledh•32m ago
It goes way before that. It starts with IDS (Integrated Data Store) from GE (1964), which was a network database system. Next was IBM's hierarchical database system IMS (Information Management System, 1966), still in use today. Then the CODASYL model (late 1960s), which was an effort to standardize the network model. And then Codd came up with the relational model in the early 70s, upon which an explosion of database systems were built (first is IBM System R, SQL, Oracle, DB2, Ingres). Then came the PC-based database systems you mentioned.
4ndrewl•26m ago
Oh for sure. To suggest we're only on generation 3 of "databases" is way off the mark.
spankalee•39m ago
I'm curious... I'm not a database or AI engineer. The last time I did GPU work was over a decade ago. What is the point of the "saturate an H100" metric?

I would think that a GPU isn't just sitting there waiting on a process that's in turn waiting for one query to finish to start the next query, but that a bunch of parallel queries and scans would be running, fed from many DB and object store servers, keeping the GPUs as utilized as possible. Given how expensive GPUs are, it would seem like a good trade to buy more servers to keep them fed, even if you do want to make the servers and DB/object store reads faster.

vouwfietsman•12m ago
My guess is that just the raw data size, combined with the physical limitations of your RU, makes it hard for the GPU to be fully utilized. Instead you will always be stuck on CPU (decompressing/interpreting/uploading parquet) or bandwidth (transfer from s3) being the bottleneck.

Seems that they are targeting a low-to-no overhead path from s3 bucket to GPU, by targeting: same compression/faster random access, streamed encoding from S3 while in flight, zero copy to GPU.

Not 100% clear on the details, but I doubt that they can actually saturate the cpu/gpu bus, but rather just saturate the GPU utilization, which is itself dependent on multiple possible bottlenecks but generally not on bus bandwidth.

That's not criticism: it literally means you can't do better unless you improve the GPU utilization of your AI model.

otterley•11m ago
The idea is that in a pipeline of work, the throughput is limited by the slowest component. H100 GPUs have a lot of memory bandwidth. The question then becomes how to eliminate any bottlenecks between the data store and the GPU's memory.

What's unanswered in the blog post is how a new storage format eliminates the bottleneck. Once you eliminate storage bottlenecks, the remaining bottleneck is usually the PCI bus that sits between the host memory and the GPU, and they can't solve that themselves. It might be that their database format is more space-efficient, which makes bus transfers more efficient and makes better use of the GPU's onboard memory.

They've also left unanswered how they're going to commercialize it, but my guess is that they're going to use a proprietary fork of Vortex that provides extra performance or features. The open-source release gives its customers a Reason to Believe, in marketing parlance.

otterley•28m ago
When should we expect its commercial fork of Vortex that angers the most important segment of contributors and users?
zzzeek•18m ago
This links to a super long winded blog post that sounds more like a toast at a wedding, so I went to the main page to try to see what their product is, and you just get a blitz of fancy animations of table diagrams and things and lots of very cheap sounding slogans pushed out like "Works with any data! Fully XYZ 2.0 compliant! Ties your shoes!"

basically im not sure where the product is hiding under all of this bluster but this doesnt feel very "hacker"-Y

Qwen3-Next: Towards Ultimate Training and Inference Efficiency

https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancement...
3•bratao•2m ago•0 comments

Dicio: Open-Source Voice Assistant for Android

https://github.com/Stypox/dicio-android
1•thebiblelover7•3m ago•0 comments

QEMU 10.1 experimental support for compiling to WASM

https://wiki.qemu.org/ChangeLog/10.1#Host_support_2
1•tanelpoder•4m ago•0 comments

Vibe/coding- we looking for contributor please help us

https://github.com/aymericzip/intlayer
1•aurelb•5m ago•0 comments

One SEC App – Stop mindless scrolling – scientifically

https://one-sec.app/
2•softwaredoug•8m ago•1 comments

Ask HN: What's a modern alternative to Confluence for small dev teams?

2•ivarojha•8m ago•0 comments

Albania puts AI-created 'minister' in charge of public procurement

https://www.theguardian.com/world/2025/sep/11/albania-diella-ai-minister-public-procurement
1•sorokod•8m ago•0 comments

Lessons learned from a 100 blog posts on AI

https://frontierai.substack.com/p/lessons-learned-from-a-100-blog-posts
1•cgwu•9m ago•0 comments

Show HN: I Built Davia–A New Way to Create Interactive Documents with Code

https://old.reddit.com/r/davia_ai/comments/1ndoenu/what_is_davia_a_workspace_for_creating/
5•ruben-davia•9m ago•0 comments

Really Simple Licensing

https://rslstandard.org/
1•yurivish•9m ago•0 comments

Show HN: Willow – a configurable file watcher and rule‑based file manager

https://github.com/smoqadam/willow
1•smoqadam•10m ago•0 comments

Peak Bubble

https://garymarcus.substack.com/p/peak-bubble
1•FromTheArchives•10m ago•0 comments

CO2 Transcritical Cycle Technology for Building Heating and Cooling Applications

https://www.mdpi.com/2075-5309/15/16/2952
1•PaulHoule•11m ago•0 comments

Native ACME Support Comes to Nginx

https://letsencrypt.org/2025/09/11/native-acme-for-nginx
2•Velocifyer•13m ago•0 comments

Medra: Physical AI in the Lab

https://www.medra.ai/launch
5•amichlee•14m ago•0 comments

Graphic video of Kirk shooting shows how media gatekeeper role has changed

https://apnews.com/article/charlie-kirk-video-graphic-online-social-media-6cfd4dfde356b960aeea69c...
3•SilverElfin•15m ago•0 comments

Browser Support in 2025: What new features can I safely use?

https://www.caseywatts.com/blog/browser-support-2025/
1•mooreds•16m ago•0 comments

Mary Lou Jepson's medical device

https://threadreaderapp.com/thread/1965856276300103755.html
1•ChuckMcM•18m ago•1 comments

Live Translation with AirPods is not available if you are in the EU

https://www.apple.com/ios/feature-availability/
2•praseodym•20m ago•1 comments

Charlie Kirk Was Practicing Politics the Right Way

https://www.nytimes.com/2025/09/11/opinion/charlie-kirk-assassination-fear-politics.html
4•hahahacorn•21m ago•2 comments

More Objects from Films

https://marcusjmerritt.com/objects-from-films/
1•colinprince•21m ago•0 comments

Hash Collision Probabilities

https://preshing.com/20110504/hash-collision-probabilities/
2•lordleft•23m ago•0 comments

Don't Inherit the Box Model

https://www.oddbird.net/2025/09/04/box-model/
1•eustoria•25m ago•0 comments

How to build a custom Modal with the native <dialog> element

https://jsdev.space/howto/native-dialog-modal/
1•eustoria•27m ago•0 comments

StreamKin: Memory assistant for streamers (Twitch/YouTube and AI)

https://beta.streamkin.com/
2•kgnz•28m ago•1 comments

AMA: The Recent Red Sea Cable Cuts with Doug Madory, Internet Data Analyst

https://old.reddit.com/r/networking/comments/1nee01z/ama_im_doug_madory_internet_data_analyst_ask...
1•oavioklein•29m ago•0 comments

EnvX: Agentize Everything with Agentic AI

https://arxiv.org/abs/2509.08088
1•pongogogo•29m ago•0 comments

NPM Security Collapsed Thanks to a 2FA Exploit

https://securityboulevard.com/2025/09/how-npm-security-collapsed-thanks-to-a-2fa-exploit/
1•CrankyBear•30m ago•0 comments

Mistral's Three Founders Become First AI Billionaires in France

https://www.bloomberg.com/news/articles/2025-09-11/first-ai-billionaires-emerge-from-french-homeg...
1•petethomas•30m ago•0 comments

Firm Inflation, Soft Jobs Data Pull Fed in Opposing Directions

https://www.wsj.com/economy/cpi-inflation-august-2025-interest-rate-ed9f1e7c
3•JumpCrisscross•32m ago•0 comments