We’ve spent the last 5 years building umbrelOS to make self-hosting accessible. Yesterday, we launched our dream hardware: Umbrel Pro.
Specs:
- 4x NVMe SSD slots for storage (tool-less operation)
- Intel N300 CPU (8 cores, 3.8GHz)
- 16GB LPDDR5 RAM
- 64GB onboard eMMC with umbrelOS
The chassis is milled from a single block of aluminum and framed with real American Walnut wood.
Here is a video of the manufacturing process if you want to nerd out on the machining details: https://youtu.be/4IAXfgBnRe8
Also, we built a "FailSafe" mode in umbrelOS, powered by ZFS raidz1. The coolest part is the flexibility: you can start with a single SSD and enable RAID later when you add the second drive (without wiping data), or enable it from day one if you start with multiple drives.
We also really obsessed over the thermal design. The magnetic lid on the bottom has a thermal pad that makes direct contact with all 4 NVMe SSDs, transferring heat into the aluminum. Air is pulled through the side vents on the lid, flows over the SSDs, then the motherboard/CPU, and exits the back. It runs whisper quiet.
Lots more details on our website, but we’ll be hanging out in the comments to answer any questions :)
f30e3dfed1c9•15h ago
If you start with one SSD, how can you later make that into a raidz1 of two? Also, a raidz1 of two block devices does not seem like a really great idea.
f30e3dfed1c9•15h ago
Another question: the hardware looks pretty nice. Can I run FreeBSD on it?
lukechilds•7h ago
Yes, you can run anything on it.
f30e3dfed1c9•14h ago
FWIW, this is what Gemini thinks you are likely doing. Is this correct, or close?
The Trick: The "Sparse File" Loopback
Since ZFS doesn't allow you to convert a single disk vdev to RAID-Z1, Umbrel's "FailSafe" mode almost certainly uses a sparse file to lie to the system.
Phase 1 (Single Drive): When you set up Umbrel with one 2TB SSD, they don't just create a simple ZFS pool. They likely create a RAID-Z1 pool consisting of your physical SSD and two "fake" virtual disks (large files on the same SSD).
The "Degraded" State: They immediately "offline" or "remove" the fake disks. The pool stays in a DEGRADED state but remains functional. To you, the UI just shows "1 Drive."
Phase 2 (Adding the 2nd Drive): When you plug in the second drive, umbrelOS likely runs a zpool replace command, replacing one of those "fake" virtual disks with your new physical SSD.
Resilvering: ZFS then copies the parity data onto the second disk.
lukechilds•6h ago
Hey, other founder here.
Great question! Close, but not exactly. We do use a sparse file but only very briefly during the transition.
We start with 1 SSD as a single top level vdev. When you add the second SSD you choose if you want to enable FailSafe or not. If you don't enable FailSafe you can just keep adding disks and they will be added as top level vdevs. Giving you maximum read and write performance due to striping data across them. Very simple, no tricks.
However if you choose FailSafe when you add your second SSD, we then do a bit of ZFS topology surgery, but only very briefly. So you start with a ZFS pool with a single top level vdev running on your current SSD. And you just added a new unused SSD and chose to transition to FailSafe mode. First we create a sparse file sized to the exact same size as your current active SSD. Then we create an entirely new pool with a single top level raidz1 vdev backed by two disks, the new SSD, and the sparse file. The sparse file acts as a placeholder for your current active SSD in the new pool. We then immediately remove the sparse file so this new pool and dataset is degraded. We then take a snapshot of the first dataset, and sync the entire snapshot over to the new pool. The system is live and running off the old pool for this whole process.
Once the snapshot has completed we then very briefly reboot to switch to the new pool. (We have the entire OS running on a writable overlay on the ZFS dataset). This is an atomic process. Early on in the boot process, before the ZFS dataset is mounted, we take an additional snapshot of the old dataset, and do an incremental sync over to the new dataset. This is very quick and copies over any small changes since the first snapshot was created.
Once this sync has completed, the two separate pools now contain identical data. We then mount the new pool and boot up with it. Then we can destroy the old pool, and attach the old SSD to the new pool, bringing it out of degraded state. And the old SSD will be resilvered in the new pool. The user is now booted up on a two wide raidz1 dataset on the new pool with bit-for-bit identical data that they shutdown on with the single ssd dataset on the old pool.
Despite sounding a bit wacky, the transition process is actually extremely safe. Apart from the switch over to the new dataset, the entire process happens in the background with the system online and fully functional. The transition can fail at almost any point and it will gracefully roll back to the single SSD. We only nuke the old single SSD at the very last step, so either we can roll back, or they have a working raidz1 array.
It sounds bad that the raidz1 goes through a period of degradation, but there is no additional risk here over not doing the transition. They are coming from a single disk vdev that already cannot survive a single disk failure. We briefly put them through a degraded raidz1 array that can also not survive a single disk loss, (no less risky than how they were already operating), to then end up at a healthy raidz1 array that can survive a single disk loss, significantly increasing the safety in a simple and frictionless way for the user.
Using two wide raidz1 arrays also get's a bit of a kneejerk reaction but it turns out for our use case the downsides are practically negligible and the upsides are huge. Mirrors basically give you 2x read speed over two disk raidz1. And less read intensive rebuilds. Everything else is pretty much the same or the differences are negligible. It turns out those benefits don't make a meaningful difference to us. A single SSD can already far exceed the bandwidth required to fully saturate our 2.5GbE connection. The additional speed of a mirror is nice but not really that noticeable. However the absolute killer feature of raidz is raidz expansion. Once we've moved to a two disk wide raidz1 array, which is not the fastest possible 2 disk configuration, but more than fast enough for what we need, we can add extra SSDs and do online expansions to a 3 disk raidz1 array and then 4 disk raidz1 array etc. As you add more disks to the raidz1 array, you also stripe reads and writes across n-1 disks, so with 4 disks you exceed the mirror perf benefits anyway.
In theory we could start with one SSD, then migrate to a mirror with the second SSD, and then again migrate to a 3 disk raidz1 array using the sparse file trick. However it's extra complexity for negligible improvements. And when moving from the mirror to the raidz1, you then degrade the user AFTER you've told them they're running FailSafe. Which changes the transition process from a practically zero additional risk operation, to an extremely high risk operation.
Ultimately what we think this design gives us is the simplest consumer RAID implementation with the highest safety guarantees that exist today. We provide ZFS level data assurance, with Synology SHR style one-by-one disk expansion, in an extremely simple and easy to use UI.
f30e3dfed1c9•6h ago
Thanks for the thorough answer. It is a little wacky and complicated but I agree it should be safe. I'm not really in the target market for your software but the hardware does look very nice. Good luck with it.
mayankchhabra•17h ago
We’ve spent the last 5 years building umbrelOS to make self-hosting accessible. Yesterday, we launched our dream hardware: Umbrel Pro.
Specs: - 4x NVMe SSD slots for storage (tool-less operation) - Intel N300 CPU (8 cores, 3.8GHz) - 16GB LPDDR5 RAM - 64GB onboard eMMC with umbrelOS
The chassis is milled from a single block of aluminum and framed with real American Walnut wood.
Here is a video of the manufacturing process if you want to nerd out on the machining details: https://youtu.be/4IAXfgBnRe8
Also, we built a "FailSafe" mode in umbrelOS, powered by ZFS raidz1. The coolest part is the flexibility: you can start with a single SSD and enable RAID later when you add the second drive (without wiping data), or enable it from day one if you start with multiple drives.
We also really obsessed over the thermal design. The magnetic lid on the bottom has a thermal pad that makes direct contact with all 4 NVMe SSDs, transferring heat into the aluminum. Air is pulled through the side vents on the lid, flows over the SSDs, then the motherboard/CPU, and exits the back. It runs whisper quiet.
Lots more details on our website, but we’ll be hanging out in the comments to answer any questions :)
f30e3dfed1c9•15h ago
f30e3dfed1c9•15h ago
lukechilds•7h ago
f30e3dfed1c9•14h ago
The Trick: The "Sparse File" Loopback
Since ZFS doesn't allow you to convert a single disk vdev to RAID-Z1, Umbrel's "FailSafe" mode almost certainly uses a sparse file to lie to the system.
Phase 1 (Single Drive): When you set up Umbrel with one 2TB SSD, they don't just create a simple ZFS pool. They likely create a RAID-Z1 pool consisting of your physical SSD and two "fake" virtual disks (large files on the same SSD).
The "Degraded" State: They immediately "offline" or "remove" the fake disks. The pool stays in a DEGRADED state but remains functional. To you, the UI just shows "1 Drive."
Phase 2 (Adding the 2nd Drive): When you plug in the second drive, umbrelOS likely runs a zpool replace command, replacing one of those "fake" virtual disks with your new physical SSD.
Resilvering: ZFS then copies the parity data onto the second disk.
lukechilds•6h ago
Great question! Close, but not exactly. We do use a sparse file but only very briefly during the transition.
We start with 1 SSD as a single top level vdev. When you add the second SSD you choose if you want to enable FailSafe or not. If you don't enable FailSafe you can just keep adding disks and they will be added as top level vdevs. Giving you maximum read and write performance due to striping data across them. Very simple, no tricks.
However if you choose FailSafe when you add your second SSD, we then do a bit of ZFS topology surgery, but only very briefly. So you start with a ZFS pool with a single top level vdev running on your current SSD. And you just added a new unused SSD and chose to transition to FailSafe mode. First we create a sparse file sized to the exact same size as your current active SSD. Then we create an entirely new pool with a single top level raidz1 vdev backed by two disks, the new SSD, and the sparse file. The sparse file acts as a placeholder for your current active SSD in the new pool. We then immediately remove the sparse file so this new pool and dataset is degraded. We then take a snapshot of the first dataset, and sync the entire snapshot over to the new pool. The system is live and running off the old pool for this whole process.
Once the snapshot has completed we then very briefly reboot to switch to the new pool. (We have the entire OS running on a writable overlay on the ZFS dataset). This is an atomic process. Early on in the boot process, before the ZFS dataset is mounted, we take an additional snapshot of the old dataset, and do an incremental sync over to the new dataset. This is very quick and copies over any small changes since the first snapshot was created.
Once this sync has completed, the two separate pools now contain identical data. We then mount the new pool and boot up with it. Then we can destroy the old pool, and attach the old SSD to the new pool, bringing it out of degraded state. And the old SSD will be resilvered in the new pool. The user is now booted up on a two wide raidz1 dataset on the new pool with bit-for-bit identical data that they shutdown on with the single ssd dataset on the old pool.
Despite sounding a bit wacky, the transition process is actually extremely safe. Apart from the switch over to the new dataset, the entire process happens in the background with the system online and fully functional. The transition can fail at almost any point and it will gracefully roll back to the single SSD. We only nuke the old single SSD at the very last step, so either we can roll back, or they have a working raidz1 array.
It sounds bad that the raidz1 goes through a period of degradation, but there is no additional risk here over not doing the transition. They are coming from a single disk vdev that already cannot survive a single disk failure. We briefly put them through a degraded raidz1 array that can also not survive a single disk loss, (no less risky than how they were already operating), to then end up at a healthy raidz1 array that can survive a single disk loss, significantly increasing the safety in a simple and frictionless way for the user.
Using two wide raidz1 arrays also get's a bit of a kneejerk reaction but it turns out for our use case the downsides are practically negligible and the upsides are huge. Mirrors basically give you 2x read speed over two disk raidz1. And less read intensive rebuilds. Everything else is pretty much the same or the differences are negligible. It turns out those benefits don't make a meaningful difference to us. A single SSD can already far exceed the bandwidth required to fully saturate our 2.5GbE connection. The additional speed of a mirror is nice but not really that noticeable. However the absolute killer feature of raidz is raidz expansion. Once we've moved to a two disk wide raidz1 array, which is not the fastest possible 2 disk configuration, but more than fast enough for what we need, we can add extra SSDs and do online expansions to a 3 disk raidz1 array and then 4 disk raidz1 array etc. As you add more disks to the raidz1 array, you also stripe reads and writes across n-1 disks, so with 4 disks you exceed the mirror perf benefits anyway.
In theory we could start with one SSD, then migrate to a mirror with the second SSD, and then again migrate to a 3 disk raidz1 array using the sparse file trick. However it's extra complexity for negligible improvements. And when moving from the mirror to the raidz1, you then degrade the user AFTER you've told them they're running FailSafe. Which changes the transition process from a practically zero additional risk operation, to an extremely high risk operation.
Ultimately what we think this design gives us is the simplest consumer RAID implementation with the highest safety guarantees that exist today. We provide ZFS level data assurance, with Synology SHR style one-by-one disk expansion, in an extremely simple and easy to use UI.
f30e3dfed1c9•6h ago
lukechilds•2h ago