frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Eclaire – Open-source, privacy-focused AI assistant for your data

https://www.eclaire.co/
1•korale•35s ago•0 comments

The Simplest iPhone Camera App

https://apps.apple.com/us/app/minimal-cam-zero-ui-gestures/id960929710
1•Jonovono•55s ago•1 comments

Luxor.jl – Simple drawings using vector graphics, Cairo "for tourists "

https://github.com/JuliaGraphics/Luxor.jl
1•TheWiggles•1m ago•0 comments

Claude Code with MCP is all you need

https://composio.dev/blog/cluade-code-with-mcp-is-all-you-need
1•sixhobbits•1m ago•0 comments

New mission could unravel mysteries of heliosphere (complex cosmic environment)

https://www.cnn.com/2025/09/24/science/heliosphere-spacex-nasa-imap-launch
1•Brajeshwar•1m ago•0 comments

How the Silicon Valley 'warlord' Steven Simoni got The Pentagon's attention

https://www.reuters.com/technology/how-silicon-valley-warlord-got-pentagons-attention-2025-10-01/
1•giuliomagnifico•2m ago•0 comments

Remote work isn't the problem. Mediocre leadership is

https://www.fastcompany.com/91391672/remote-work-isnt-the-problem-mediocre-leadership-is-work-fro...
1•speckx•2m ago•0 comments

Kubernetes Service vs. Headless Service Explained

https://www.timeplus.com/post/kubernetes-service-vs-headless-service
1•gangtao•3m ago•0 comments

Show HN: Claude Code router 2.0 – preference-aligned routing to multiple LLMs

https://github.com/katanemo/archgw/tree/main/demos/use_cases/claude_code_router
1•adilhafeez•4m ago•1 comments

Ancient Rock art in Saudi Arabia hints at how humans repopulated desert

https://www.abc.net.au/news/science/2025-10-01/rock-art-sites-discovered-saudi-arabia-desert-arid...
1•Brajeshwar•4m ago•0 comments

I've locked myself out of my digital life (2022)

https://shkspr.mobi/blog/2022/06/ive-locked-myself-out-of-my-digital-life/
1•Brajeshwar•5m ago•0 comments

ADP report: Private employers unexpectedly shed 32,000 jobs

https://finance.yahoo.com/news/adp-report-private-employers-unexpectedly-shed-32000-jobs-as-labor...
1•ndiddy•5m ago•1 comments

Thomas Pynchon's Books: A Guide

https://www.nytimes.com/article/thomas-pynchon-books.html
1•anarbadalov•7m ago•0 comments

Claude Code Is Having Its Cursor Moment

https://www.vincentschmalbach.com/claude-code-is-having-its-cursor-moment/
1•vincent_s•8m ago•1 comments

Taiwan will not agree to 50-50 chip production deal with US, negotiator says

https://www.reuters.com/world/asia-pacific/taiwan-will-not-agree-50-50-chip-production-deal-with-...
1•voxadam•9m ago•1 comments

AI Powered Calorie and Macro Tracking

https://www.bodhigpt.com/tools/health-tracker
2•whatcha•10m ago•1 comments

Funding the Frontier predicts the impact of your research

https://www.nature.com/articles/d41586-025-03120-6
1•rntn•11m ago•0 comments

Ask HN: Who's Looking for Cofounders (October 2025)

1•kristopolous•13m ago•0 comments

Everyday Site Builder

https://faces.app/
4•ignaciosoffia•14m ago•1 comments

Reddit stock slides as ChatGPT citations fall

https://techyquantum.com/reddit-stock-falls-as-chatgpt-references-drop/
1•techxofyar•14m ago•0 comments

Space-time crystals from particle-like topological solitons

https://www.nature.com/articles/s41563-025-02344-1
1•PaulHoule•14m ago•0 comments

Battering RAM: Low-Cost Interposer Attacks on Confidential Computing

https://batteringram.eu/
2•mici•15m ago•1 comments

Development Gets Better with Age

https://www.allthingsdistributed.com/2025/10/better-with-age.html
4•caution•15m ago•0 comments

Claude plays Catan: Managing agent context with Sonnet 4.5 [video]

https://www.youtube.com/watch?v=BER3EhUIyz0
1•kerim-ca•16m ago•0 comments

Billionaire Blast Off

https://billionaireblastoff.firefox.com/
2•HieronymusBosch•16m ago•1 comments

Raspberry Pi prices hiked as AI gobbles all the memory

https://www.theregister.com/2025/10/01/raspberry_pi_price_hikes/
2•geerlingguy•16m ago•0 comments

Once again, Netanyahu has outplayed Trump

https://www.theguardian.com/commentisfree/2025/oct/01/netanyahu-trump-israel-gaza-peace-plan
2•NomDePlum•16m ago•1 comments

AI is the new social media

https://www.vincentschmalbach.com/ai-is-the-new-social-media/
1•vincent_s•17m ago•0 comments

Trump's peace plan is everything Israelis dreamed of. But it's a fantasy

https://www.theguardian.com/commentisfree/2025/oct/01/trump-peace-plan-fantasy-us-president-gaza
3•hebelehubele•18m ago•1 comments

Ask HN: Do You Use Teamblind.com?

2•napolux•18m ago•1 comments
Open in hackernews

Building a 30 PB storage cluster in the heart of SF

https://si.inc/posts/the-heap/
62•nee1r•1h ago

Comments

g413n•1h ago
No mention of disk failure rates? curious how it's holding up after a few months
ClaireBookworm•54m ago
good point
bayindirh•44m ago
The disk failure rates are very low when compared to decade ago. I used to change more than a dozen disks every week a decade ago. Now it's an eyebrow raising event which I seldom see.

I think following Backblaze's hard disk stats is enough at this point.

cjaackie•34m ago
They mentioned the cluster being used enterprise drives, I can see the desire to save money but agree, that is going to be one expensive mistake down the road.

I should also note personally for home cluster use, I learned quickly that used drives didn’t seem to make sense. Too much performance variability.

g413n•24m ago
eh in a datacenter context failure rates are just a remote-hands recurring cost so that's maybe not too bad with front-loaders? e.g. have someone show up to the datacenter with a grocery list of slot indices and a cart of fresh drives every few months.
guywithahat•22m ago
Used drives make sense if maintaining your home server is a hobby. It's fun to diagnose and solve problem in home servers, and failing drives give me a reason to work on the server. (I'm only half-joking, it's kind of fun)
jms55•18m ago
If I remember correctly, most drives either:

1. Fail in the first X amount of time

2. Fail towards the end of their rated lifespan

So buying used drives doesn't seem like the worst idea to me. You've already filtered out the drivers that would fail early.

Disclaimer: I have no idea what I'm talking about

ClaireBookworm•53m ago
great write up, really appreciate the explanations / showing the process
nharada•45m ago
So how do they get this data to the GPUs now...? Just run it over the public internet to the datacenter?
bayindirh•43m ago
They can rent a dark fiber for themselves for that distance, and it'll be cheap.

However, as they noted they use 100gbps capacity from their ISP.

nee1r•38m ago
We want to get darkfiber from the datacenter to the office. I love 100Gbps
g413n•42m ago
7.5k for zayo 100gig so that's like half of the MRC
nee1r•40m ago
yeah, exactly! we have a 100G uplink, and then we use nginx secure links that we then just curl from the machines using HTTP. (funnily HTTPS adds overhead so we just pre-sign URLs)
miniman1337•41m ago
Used Disks, No DR, not exactly a real shoot out.
nee1r•36m ago
True, though this is specifically for pretraining data (S3 wouldn't sell us used disk + no DR storage).
p_ing•18m ago
You're in a seismically active part of the world. Will the venture last in a total loss scenario?
nee1r•12m ago
We're currently 1/1 for the recent 4.3 magnitude earthquake (though if SF crumbles we might lose data)
leejaeho•38m ago
how long do you think it'll be before you fill all of it and have to build another cluster LOL
nee1r•34m ago
Already filled up and looking to possibly copy and paste :)
giancarlostoro•23m ago
So, others have asked, and I'm curious myself are you sourcing the videos yourselves or third parties?
not--felix•36m ago
But where do you get 90 million hours worth of video data?
myflash13•30m ago
And not just any video data, they specifically mentioned screen recordings for agentic computer uses. A very specific kind of video. My guess is they have a partnership with someone like Rewind.ai
conception•29m ago
Arrr matey
mschuster91•26m ago
Shows how crazy cheap on prem can be. tips hat
nee1r•23m ago
tips hat back
stackskipton•16m ago
Not included is overhead of dealing with maintenance. S3/R2 generally don’t require OPS type dedicated to care and feeding. This type of setup will likely require someone to spend 5 hours a week dealing with it.
mschuster91•13m ago
I once had about three racks full of servers under my control, admittedly they weren't a ton of disks, but still the hardware maintenance effort was pretty much negligible over a few years (until it all went to the cloud).

The majority of server wrangling work I spent dealing with OS updates and, most annoyingly, OpenStack. But that's something you can't escape even if you run your stuff in the cloud...

nee1r•8m ago
True, this is a large reason why we chose to have the datacenter a couple blocks away from the office.
g413n•23m ago
the doodles are great
nee1r•22m ago
Thanks! Lots of hard work went into them.
zparky•20m ago
$125/disk, 12k/mo depreciation cost which i assume means disk failures, so ~100 disks/mo or 1200/yr, which is half of their disks a year - seems like a lot.
devanshp•14m ago
no, we wanted to be conservative by depreciating somewhat more aggressively than that. we have much closer to 5% yearly disk failure rates.
AnotherGoodName•12m ago
It's an accounting term. You need to report the value of assets of your company each reporting cycle. This allows you to report company profit more accurately since the 2400 drives aren't likely not worth what the company originally paid. It's stated as a tax write-off but people get confused with that term (they think X written off == X less tax paid). It's better to correctly state it as a way to more accurately report profit (which may end up with less company tax paid but obviously not 1:1 since company tax is not 100%).

So anyway you basically pretend you resold the drives today. Here they are assuming in 3 years time no one will pay anything for the drives. Somewhat reasonable to be honest since the setup's bespoke and you'll only get a fraction of the value of 3 year old drives if you resold them.

ttfvjktesd•11m ago
The biggest part that is always missing in such comparisons is the employee salaries. In the calculation they give $354k/year of total cost per year. But now add the cost of staff in SF to operate that thing.
OutOfHere•6m ago
Is it correct that you have zero data redundancy? This may work for you if you're just hoarding videos from YouTube, but not for most people who require an assurance that their data is safe. Even for you, it may hurt proper benchmarking, reproducibility, and multi-iteration training if the parent source disappears.
nee1r•4m ago
Definitely much less redundancy, this was definitely a tradeoff we made for pretraining data and cost.
RagnarD•3m ago
I love this story. This is true hacking and startup cost awareness.