frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

We built another object storage

https://fractalbits.com/blog/why-we-built-another-object-storage/
105•fractalbits•4h ago

Comments

fractalbits•4h ago
github page: https://github.com/fractalbits-labs/fractalbits-main
andai•2h ago
HN's version of this title is unintentional comedy :)
imvetri•2h ago
Can you say, how is this diferent in terms of data structure over conventional one please?
dbacar•1h ago
One can only hope this does not go to same direction like Minio once they gain momentum.
whinvik•1h ago
Interesting. Have you seen any benefits of using io-uring. It seems io-uring is constatly talked about but no one seems to be really using it in anger.
6r17•1h ago
Io-uring has it's fair amount of CVEs ; I'm wondering if people are checking these out ; because the goal is not to just make something fast ; but fast & secure. It's a little bit of a grey area in my opinion for prod on public machines. Anyone has a counter view on this I'm genuinely curious maybe i'm over cautious ?

ps : there are actually other faster and more secure options than io-uring but I won't spoil ;)

hansvm•1h ago
My understanding is that the iouring CVEs are about local privilege escalation, not being appropriately sandboxed, etc. If you're only running code you trust on machines with iouring enabled then you're fine (give or take "defense in depth").

Is that not accurate?

jamiesonbecker•1h ago
These questions are meant to be constructively critical, but not hyper-critical: I'm genuinely interested and a big fan of open-source projects in this space:

* In terms of a high-performance AI-focused S3 competitor, how does this compare to NVIDIA's AIstore? https://aistore.nvidia.com/

* What's the clustering story? Is it complex like ceph, requires K8s like AIstore for full functionality, or is it more flexible like Garage, Minio, etc?

* You spend a lot of time talking about performance; do you have any benchmarks?

* Obviously most of the page was written by ChatGPT: what percentage of the code was written by AI, and has it been reviewed by a human?

* How does the object storage itself work? How is it architected? Do you DHT, for example? What tradeoffs are there (CAP, for example) vs the 1.4 gazillion alternatives?

* Are there any front-end or admin tools (and screenshots)?

* Can a cluster scale horizontally or only vertically (ie Minio)

* Why not instead just fork a previous version of Minio and then put a high-speed metadata layer on top?

* Is there any telemetry?

* Although it doesn't matter as much for my use case as for others, what is the specific jurisdiction of origin?

* Is there a CLA and does that CLA involve assigning rights like copyright (helps prevent the 'rug-pull' closing-source scenario)?

* Is there a non-profit Foundation, goal for CNCF sponsorship or other trusted third-party to ensure that the software remains open source (although forks of prior versions mostly mitigates that concern)?

Thanks!

mrweasel•1h ago
> the page was written by ChatGPT

I wonder in that's why it's all over the place. Meta engine written in Zig, okay, do I need to care? Gateway in Rust... probably a smart choice, but why do I need to be able to pick between web frameworks?

> Most object stores use LSM-trees (good for writes, variable read latency) or B+ trees (predictable reads, write amplification). We chose a radix tree because it naturally mirrors a filesystem hierarchy

Okay, so are radix tree good for write, and reads, bad for both, somewhere in between?

What is "physiological logging"?

randallsquared•1h ago
A hybrid of physical logging, which is logging page-by-page changes, and logical logging, which is recording the activity performed at an intent level. If you do both of these, it's apparently "physiological", which I imagine was first conceived of as "physio-logical".

I could only find references to this in database systems course notes, which may indicate something.

throwaway894345•55m ago
I’m also curious about the Kubernetes story—specifically how can one run this in Kubernetes?
ChocolateGod•1h ago
so they added a metadata engine to S3?

How does that compare to something like JuiceFS.

Aperocky•1h ago
So they built an object storage to replace filesystem.

And in "Why Not Just Use a Filesystem?", the answer they gave is "the line is already blurring" and "industry is converging".

The line maybe blurring but as mentioned is still a clear cut use case for file system - or if higher access speed is warranted, just slap more RAM to the system and cache them. It will still cost less even at current cost of RAM.

zozbot234•10m ago
AIUI, one obvious difference between object storage and file system (beyond things like support for directories and file name lookups, which OP talks about already) is that an object storage has only atomic file store/replace, whereas a file system has to support arbitrary edits on both file content and directories/metadata.
oersted•1h ago
Small objects and low latency.

Why not use any of the great KV stores out there? Or a traditional database even.

People use object storage for the low cost, not because it is a convenient abstraction. I suspect some people use the faster expensive S3 simply as a stopgap. Because they started with object storage, the requirements changed, it is no longer the right tool for the job but it is a hassle to switch, and AWS is taking advantage of their situation. I suppose that offering an alternative to those people for a non-extortionate price is a decent business model, but I am not sure how big that market is or how long it will last. And it's not really a question of better tech, I'm sure AWS could make it a lot cheaper if they wanted to.

But object storage at the price of a database with the performance of a database, is just a database, and I doubt that quickly reinventing that wheel yielded anything too competitive.

kburman•1h ago
I feel like this product is optimizing for an anti-pattern.

The blog argues that AI workloads are bottlenecked by latency because of 'millions of small files.' But if you are training on millions of loose 4KB objects directly from network storage, your data pipeline is the problem, not the storage layer.

Data Formats: Standard practice is to use formats like WebDataset, Parquet, or TFRecord to chunk small files into large, sequential blobs. This negates the need for high-IOPS metadata operations and makes standard S3 throughput the only metric that matters (which is already plentiful).

Caching: Most high-performance training jobs hydrate local NVMe scratch space on the GPU nodes. S3 is just the cold source of truth. We don't need sub-millisecond access to the source of truth, we need it at the edge (local disk/RAM), which is handled by the data loader pre-fetching.

It seems like they are building a complex distributed system to solve a problem that is better solved by tar -cvf

jeremyjh•54m ago
Yeah I was a bit lost from the introduction. High performance object stores are "too expensive?" We live an era where I can store everything forever and query it in human scale time-frames at costs that are far less than what we paid for much worse technologies a decade ago. But I was thinking of datalakes, not vector stores or whatever they are trying to solve for AI.
Scubabear68•47m ago
Loved your sentence at the end about tar -cvf.

Every generation seems to have to learn the lesson about batching small inputs together to keep throughput up.

hodgesrm•17m ago
> It seems like they are building a complex distributed system to solve a problem that is better solved by tar -cvf

That doesn't work on Parquet or anything compressed. In real-time analytics you want to load small files quickly into a central location where they can be both queried and compacted (different workloads) at the same time. This is hard to do in existing table formats like Iceberg. Granted not everyone shares this requirement but it's increasingly important for a wide range of use cases like log management.

fulafel•12m ago
You can do app optimizations to work with object databases that are slow for small objects, or you can have a fast object database - doesn't seem that black and white. If you can build a fast object database that is robust and solves that problem well, it's (hopefully) a non leaky abstraction that can warrant some complexity inside.
deliciousturkey•8m ago
In AI training, you want to sample the dataset in arbitrary fashion. You may want to arbitrarily subset your dataset for specific jobs. These are fundamentally opposed demands compared to linear access: To make your tar-file approach work, the data has to ordered to match the sample order of your training workload, coupling data storage and sampler design.

There are solutions for this, but the added complexity is big. In any case, your training code and data storage become tightly coupled. If you can avoid it by having a faster storage solution, at least I would be highly appreciative of it.

tsuru•1h ago
Every time I hear hierarchical storage, I can't help but think "It's all coming back to MUMPS, isn't it?"
hansvm•1h ago
Nice. I was looking at building an object store myself. It's fun to see what features other people think are important.

I'm curious about one aspect though. The price comparison says storage is "included," but that hides the fact that you only have 2TB on the suggested instance type, bringing the storage cost to $180/TB/mo if you pay each year up-front for savings, $540/TB/mo when you consider that the durability solution is vanilla replication.

I know that's "double counting" or whatever, but the read/write workloads being suggested here are strange to me. If you only have 1875GB of data (achieved with 3 of those instances because of replication) and sustain 10k small-object (4KiB) QPS as per the other part of the cost comparison, you're describing a world where you read and/or write 50x your entire storage capacity every month.

I know there can be hot vs cold objects or workloads where most data is transient, but even then that still feels like a lot higher access amplification than I would expect from most workloads (or have ever observed in any job I'm allowed to write about publicly). With that in mind, the storage costs themselves actually dominate, and you're at the mercy of AWS not providing any solution even as cheap as 6x the cost of a 2-year amortized SSD (and only S3 comes close -- it's worse when you rent actual "disks," doubly so when they're high-performance).

websiteapi•1h ago
it's always interesting to me how our profession keeps reimplementing the same sort of thing over and over and over again. is it just inherent to the ease in which our experiments can be conducted?
firesteelrain•11m ago
How does this compare to Dell’s ObjectScale?

We eliminated MinIO on vSAN in lieu of ObjectScale for on prem.

orliesaurus•10m ago
I'm more interested in the design philosophy behind these projects than which benchmarks top the charts...

A lot of the high performance S3 alternatives trumpet crazy IOPS numbers, but the devil is in how they handle metadata and consistency. FractalBits says it offers strong consistency and atomic rename ([Why We Built Another Object Storage (And Why It's Different)](https://fractalbits.com/blog/why-we-built-another-object-sto...)), which makes it different from most eventual consistency S3 clones. That implies a full‑path indexing metadata engine (something they mention in a LinkedIn post). That’s a really interesting direction because it potentially avoids some of the inode bottlenecks you see in Ceph and MinIO.

BUT the real question for me is long‑term sustainability. Running your own object store is a commitment. Who's maintaining it when the original team moves on? It's great to see new entrants with ideas, ALSO it would be reassuring if there were clear governance and a non‑profit steward at some point.

I don't mind if something uses AI to draft marketing copy... as long as the code is readable, reviewed, and licensed in a way that keeps it open. The space is crowded, and differentiation often comes down to the less flashy stuff: operational tooling, monitoring, easy deployment across zones, and how it fails. I'm curious to see where this one goes.

LG TV's new software update installed MS Copilot, which cannot be deleted

https://old.reddit.com/r/mildlyinfuriating/comments/1plldqo/my_lg_tvs_new_software_update_install...
92•bj-rn•57m ago•46 comments

We built another object storage

https://fractalbits.com/blog/why-we-built-another-object-storage/
106•fractalbits•4h ago•26 comments

Show HN: Kinkora – A creative playground for experimenting with video models

https://kinkora.fun/
5•heavenlxj•14m ago•1 comments

Java FFM zero-copy transport using io_uring

https://www.mvp.express/
51•mands•6d ago•9 comments

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

https://developer.apple.com/documentation/macos-release-notes/macos-26_2-release-notes#RDMA-over-...
482•guiand•19h ago•242 comments

Sick of smart TVs? Here are your best options

https://arstechnica.com/gadgets/2025/12/the-ars-technica-guide-to-dumb-tvs/
495•fleahunter•1d ago•396 comments

Photographer built a medium-format rangefinder, and so can you

https://petapixel.com/2025/12/06/this-photographer-built-an-awesome-medium-format-rangefinder-and...
101•shinryuu•6d ago•21 comments

Apple has locked my Apple ID, and I have no recourse. A plea for help

https://hey.paris/posts/appleid/
1078•parisidau•11h ago•618 comments

A 'toaster with a lens': The story behind the first handheld digital camera

https://www.bbc.com/future/article/20251205-how-the-handheld-digital-camera-was-born
51•selvan•5d ago•23 comments

GNU Unifont

https://unifoundry.com/unifont/index.html
293•remywang•19h ago•69 comments

Will West Coast Jazz Get Some Respect?

https://www.honest-broker.com/p/will-west-coast-jazz-finally-get
23•paulpauper•6d ago•7 comments

Computer Animator and Amiga fanatic Dick Van Dyke turns 100

154•ggm•8h ago•33 comments

Rats Play DOOM

https://ratsplaydoom.com/
349•ano-ther•20h ago•127 comments

Show HN: Tiny VM sandbox in C with apps in Rust, C and Zig

https://github.com/ringtailsoftware/uvm32
172•trj•18h ago•12 comments

Beautiful Abelian Sandpiles

https://eavan.blog/posts/beautiful-sandpiles.html
101•eavan0•3d ago•17 comments

OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

https://simonwillison.net/2025/Dec/12/openai-skills/
489•simonw•17h ago•290 comments

Obscuring P2P Nodes with Dandelion

https://www.johndcook.com/blog/2025/12/08/dandelion/
64•ColinWright•5d ago•4 comments

Formula One Handovers and Handovers From Surgery to Intensive Care (2008) [pdf]

https://gwern.net/doc/technology/2008-sower.pdf
89•bookofjoe•6d ago•33 comments

Show HN: I made a spreadsheet where formulas also update backwards

https://victorpoughon.github.io/bidicalc/
187•fouronnes3•1d ago•90 comments

Cryptids

https://wiki.bbchallenge.org/wiki/Cryptids
12•frozenseven•1w ago•0 comments

How exchanges turn order books into distributed logs

https://quant.engineering/exchange-order-book-distributed-logs.html
72•rundef•5d ago•41 comments

Poor Johnny still won't encrypt

https://bfswa.substack.com/p/poor-johnny-still-wont-encrypt
61•zdw•12h ago•73 comments

Slax: Live Pocket Linux

https://www.slax.org/
54•Ulf950•5d ago•7 comments

YouTube's CEO limits his kids' social media use – other tech bosses do the same

https://www.cnbc.com/2025/12/13/youtubes-ceo-is-latest-tech-boss-limiting-his-kids-social-media-u...
137•pseudolus•4h ago•116 comments

Freeing a Xiaomi humidifier from the cloud

https://0l.de/blog/2025/11/xiaomi-humidifier/
129•stv0g•1d ago•58 comments

Go is portable, until it isn't

https://simpleobservability.com/blog/go-portable-until-isnt
124•khazit•6d ago•108 comments

Ensuring a National Policy Framework for Artificial Intelligence

https://www.whitehouse.gov/presidential-actions/2025/12/eliminating-state-law-obstruction-of-nati...
172•andsoitis•1d ago•230 comments

50 years of proof assistants

https://lawrencecpaulson.github.io//2025/12/05/History_of_Proof_Assistants.html
113•baruchel•17h ago•19 comments

Multiple Indicted on Charges of Theft and Re-Sale of Restaurant Cooking Oil

https://www.justice.gov/usao-sdia/pr/multiple-chinese-nationals-indicted-charges-related-theft-an...
6•737min•24m ago•1 comments

Windows 3.1 in the Browser

https://www.pcjs.org/software/pcx86/sys/windows/3.10/
13•memalign•7h ago•1 comments