Away this looks like a great idea and might have a chance at solving the economic issue with running nodes for cheap inference and getting paid for it.
The key question here is how they avoid the outside computer being able to view the memory of the internal process:
> An in-process inference design that embeds the in- ference engine directly in a hardened process, elimi- nating all inter-process communication channels that could be observed, with optional hypervisor mem- ory isolation that extends protection from software- enforced to hardware-enforced via ARM Stage 2 page tables at zero performance cost.[1]
I was under the impression this wasn't possible if you are using the GPU. I could be misled on this though.
[1] https://github.com/Layr-Labs/d-inference/blob/master/papers/...
And more so in particular, anyone using Darkbloom with commercial intents should only really send non-sensitive data (no tokens, customer data, ...) I'd say only classification tasks, imagine generation, etc.
Macs have secure enclaves.
But they argue that:
> PT_DENY_ATTACH (ptrace constant 31): Invoked at process startup before any sensitive data is loaded. Instructs the macOS kernel to permanently deny all ptracerequests against this process, including from root. This blocks lldb, dtrace, and Instruments.
> Hardened Runtime: The binary is code-signed with hardened runtime options and explicitly without the com.apple.security.get-task-allow entitlement. The kernel denies task_for_pid() and mach_vm_read()from any external process.
> System Integrity Protection (SIP): Enforces both of the above at the kernel level. With SIP enabled, root cannot circumvent Hardened Runtime protections, load unsigned kernel extensions, or modify protected sys- tem binaries. Section 5.1 proves that SIP, once verified, is immutable for the process lifetime.
gives them memory protection.
To me that is surprising.
Protection here is conditional, best-effort. There are no true guarantees, nor actual verifiability.
If it's not running fully end to end in some secure enclave, then it's always just a best effort thing. Good marketing though.
Very smart play to build a platform, get scale, and prove out the software. Then either add a small network fee (this could be on money movement on/off platform), add a higher tier of service for money, and/or just use the proof points to go get access to capital and become an operator in your own pool.
Non-VC play (not required until you can raise on your own terms!) and clear differentiation.
If you want to go full-business-evaluation, I would be more worried about someone else implementing same thing with more commission (imo 95% and first to market is good enough).
Others are reporting low demand, eg.: https://news.ycombinator.com/item?id=47789171
As a business owner, I can think of multiple reasons why a decentralized network is better for me as a business than relying on a hyperscaler inference provider. 1. No dependency on a BigTech provider who can cut me off or change prices at any time. I’m willing to pay a premium for that. 2. I get a residential IP proxy network built-in. AI scrapers pay big money for that. 3. No censorship. 4. Lower latency if inference nodes are located close to me.
Guess there are limitations on size of the models, but if top-tier models will getting democratized I don’t see a reason not to use this API. The only thing that comes to me is data privacy concerns.
I think batch-evals for non-sensitive data has great PMF here.
Because they were already at the finish line with Apple Silicon.
> I don’t see a reason not to use this API. The only thing that comes to me is data privacy concerns.
The whole inference is end-to-end encrypted so none of the nodes can see the prompts or the messages.
That would finally be a crypto thing which is backed by value I believe in.
My M5 Pro can generate 130 tok/s (4 streams) on Gemma 4 26B. Darkbloom's pricing is $0.20 per Mtok output.
That's about $2.24/day or $67/mo revenue if it's fully utilized 24/7.
Now assuming 50W sustained load, that's about 36 kWh/mo, at ~$.25/kWh approx. $9/mo in costs.
Could be good for lunch money every once in a while! Around $700/yr.
I don’t think this is a sustainable business model. For example, Cubbit tried to build decentralised storage, but I backed out because better alternatives now exist, and hardware continues to improve and become cheaper over time.
Your electricity and ownership are going to get lower return and does not actually requce CO2.
I’d imagine 1 year of heavy usage would somehow affect its quality.
I'd say it's not worth it. But the idea is cool.
For Gemma 4 26B their math is:
single_tok/s = (307 GB/s / 4 GB) * 0.60 = 46.0 tok/s
batched_tok/s = 46.0 * 10 * 0.9 = 414.4 tok/s
tok/hr = 414.4 * 3600 = 1,492,020
revenue/hr = (1,492,020 / 1M) * $0.200000 = $0.2984
I have no idea if that is a good estimate of how much an M5 Pro can generate - but that’s what it says on their site.
They do a bit of a sneaky thing with power calculation: they subtract 12Ws of idle power, because they are assuming your machine is idling 24/7, so the only cost is the extra 18W they estimate you’ll use doing inference. Idk about you, but i do turn my machine off when i am not using it.
What could possibly go wrong?
;P
Apple Silicon has a Secure Enclave, but not a public SGX/TDX/SEV-style enclave for arbitrary code, so these claims are about OS hardening, not verifiable confidential execution.
It would be nice if it were possible. There's a lot of cool innovations possible beyond privacy.
When your Mac is idle (no inference requests), it consumes minimal power — you don't lose significant money waiting for requests. The electricity costs shown only apply during active inference.
Text models typically see the highest and most consistent demand. Image generation and transcription requests are bursty — high volume during peaks, quiet otherwise."
In 15 minutes of serving Gemma, I got precisely zero actual inference requests, and a bunch of health checks and two attestations.
At the moment they don't have enough sustained demand to justify the earning estimates.
DeathArrow•1h ago
stryakr•1h ago
btown•1h ago
> Apple’s attestation servers will only generate the FreshnessCode for a genuine device that checks in via APNs. A software-only adversary cannot forge the MDA certificate chain (Assumption 3). Com- bined with SIP enforcement (preventing binary replace- ment) and Secure Boot (preventing bootloader tampering), this provides strong evidence that the signing key resides in genuine Apple hardware.
nl•1h ago
NVidia data center GPUs have a similar path, but not their consumer ones. Not sure about the NVidia Spark.
It's possible AMD Strix Halo can do this, but unlikely for any other PC based GPU environments.
MrDrMcCoy•55m ago