frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I audited 500 K8s pods. Java wastes ~48% RAM, Go ~18%

https://github.com/WozzHQ/wozz
33•wozzio•5h ago

Comments

wozzio•5h ago
I've been consulting on EKS/GKE cost optimization for a few mid-sized companies and kept seeing the same pattern: massive over-provisioning of memory just to be safe.

I wrote a simple CLI tool (bash wrapper around kubectl) to automate diffing kubectl top metrics against the declared requests in the deployment YAML.

I ran it across ~500 pods in production. The "waste" (allocated vs. used) average by language was interesting:

Python: ~60% waste (Mostly sized for startup spikes, then idles empty).

Java: ~48% waste (Devs seem terrified to give the JVM less than 4Gi).

Go: ~18% waste.

The tool is called Wozz. It runs locally, installs no agents, and just uses your current kubecontext to find the gap between what you pay for (Requests) and what you use (Usage).

It's open source. Feedback welcome.

(Note: The install is curl | bash for convenience, but the script is readable if you want to audit it first).

rvz•5h ago
> I've been consulting on EKS/GKE cost optimization for a few mid-sized companies and kept seeing the same pattern: massive over-provisioning of memory just to be safe.

Correct.

Developers keep over-provisioning as they need enough memory for the app to continue running as demand scales up. Since these languages have their own runtimes and GCs to manage memory, it already pre-allocates lots of RAM before running the app; adding to the bloat.

Part of the problem is not only a technical one (the language may be bloated and inefficient) but it is completely psychological as developers are scared of their app getting an out-of-memory exception in production.

As you can see, the languages with the most waste are the ones that are inefficient both runtime (speed) and space (memory) complexity and take up the most memory and are slower (Python, and Java) and costs a lot of money to continue maintaining them.

I got downvoted over questioning the microservice cargo cult [0] with Java being the darling of that cult. If you imagine a K8s cluster with any of these runtimes, you can see which one will cost the most as you scale up with demand + provisioning.

Languages like Go, and Rust are the clear winners if you want to save lots of money and are looking for efficiency.

[0] https://news.ycombinator.com/item?id=44950060

pestatije•4h ago
is this an instantaneous measure or goes over the whole duration of the process?
dboreham•2h ago
Typically they're overprovisioning through prior experience. In the past something fell over because it didn't have enough memory. So they gave it more. That practice stuck in the brain-model. Perhaps it's no longer valid, but who wants to bring down the service doing Chernobyl experiments?
PaulKeeble•1h ago
You run these tools and you find all the maximums for weeks of traffic and so you set them down to minimise cost and all is well until the event. The event doesn't really matter, it causes an increase in traffic processing time and suddenly every service needs more memory to hold all transactions, now instead they fail with out of memory and disappear and suddenly all your pods are in restart loops unable to cope and you have an outage.

The company wasting 20% extra memory on the other hand is still selling and copes with the slower transaction speed just fine.

Not sure over provisioning memory is really just waste when we have dynamic memory based languages, which is all modern languages not in real time safety critical environments.

gopher_space•1h ago
> Perhaps it's no longer valid, but who wants to bring down the service

I'm thinking more like getting a junior to do efficiency passes on a year's worth of data.

wozzio•24m ago
Exactly we call it sleep insurance. It is rational for the on call engineer to pad the numbers but it's just irrational for the finance team to pay for it.
cogman10•1h ago
Actually, this sounds like your java devs have misconfigured containers.

Java will happily eat all the memory you throw at it. The fact that it isn't means they are probably relying on the default JVM settings. Those are too conservative inside a container, especially if they are running on older JVM versions.

What you'll find is the JVM will constantly have "waste" due to it's default configuration. The question is if it ends up throwing out of memory exceptions due to the lower memory limit.

If your devs haven't looked at the heap allocation of a running application, there's no way to know if this is too much or too little memory.

Go/python much more readily give memory back to the OS.

wozzio•26m ago
The JVM expands to fill the container, but the scheduler still counts that 8GB request as used when packing the node. Even if the app only needs 2GB of working set, we are blocked from scheduling other pods on that wasted 6GB buffer.
karolinepauls•1h ago
> Python: ~60% waste (Mostly sized for startup spikes, then idles empty).

I understand we're talking about CPU in case of Python and memory for Java and Go. While anxious overprovisioning of memory is understandable, doing the same for CPU probably means lack of understanding of the difference between CPU limits and CPU requests.

Since I've been out of DevOps for a few years, is there ever a reason not to give each container the ability to spike up to 100% of 1 core? Scheduling of mass container startup should be a solved problem by now.

cogman10•1h ago
I don't think there is. You should set both and limit doesn't need to match request for CPU.

Your limit should roughly be "what should this application use if it goes full bore" and your request should be "what does this use at steady state".

At least at my company, the cluster barely uses any CPU even during the busiest hours. We are fairly over provisioned there because a lot of devs are keeping limit and request the same.

Narishma•1h ago
What makes you think they're talking about CPU? It reads to me like it's memory.
wozzio•25m ago
CPU bursting is safe you just get throttled. Memory bursting is dangerous you get OOMKilled.

That's why Python numbers look so bad here devs set the request high enough to cover that initial model loading spike so they don't crash during a rollout, even if they idle at 10% usage afterwards.

Marazan•1h ago
Interesting, at the company I work for Java pods are (historically) over provisioned with CPU but quite tightly provisioned with memory
wozzio•23m ago
That is the opposite of what I usually see. Are you trading CPU for RAM by running a more aggressive GC like ZGC or Shenandoah? Usually, people starve the CPU to buy more RAM.
stefan_•1h ago
4GiB oh no we gave it Raspberry Pi memory?

This is truly one of the dumbest outcomes of the whole "resume driven cloud deployment" era. Clouds "market segmenting" with laughable memory, resume engineers want their line but not the recurring cost, engineers waste weeks investigating and working around out of memory issues that are purely some guy provisioning services with 2003 levels of memory. It's all a giant waste of time.

perrygeo•3h ago
I'm not sure I would frame unused memory as "waste". At least not necessarily. If the system is operating below max capacity, that memory isn't wasted - it's a reserve capacity to handle surges. The behavior of the system under load (when it matters) might very well depend on that "wasted" memory.

You want to compare the _maximum_ memory under load - the high-water mark. If you combine Wozz with a benchmark to drive heavy traffic, you could more confidently say any unused memory was truly wasted.

karianna•2h ago
Usual caveats of “you should run load to see if it’s truly wasted”, but we do know that Java defaults are not ideal out of the box and so we analysed a ton of workloads on Azure and came up with this: https://learn.microsoft.com/en-us/java/jaz - better defaults out of the box for the JVM in containers at runtime.
ice3•1h ago
No source, useless outside of azure.
jeroenhd•1h ago
You can just install it outside of Azure: https://learn.microsoft.com/en-us/java/jaz/overview#install-...

Note: the tool calls out to Microsoft but the docs don't say what data is shared ("telemetry"), better make sure your firewalls are functioning before running this tool.

Not sure what the source license story is.

wozzio•5m ago
That doc is gold thanks for linking.

Yeah defaults are the enemy here. Most of the waste I'm seeing in the data comes from generic Spring Boot apps running with out of the box settings where the JVM assumes it owns the entire node.

dboreham•2h ago
A tool that claims to solve a clearly unsolvable problem!
arjie•1h ago
I often end up over-provisioning because it's got a cliff effect. If I under-provision, I end up losing the entire program and its work. If I over-provision, I see a smoother increase in running cost. An amusing note:

    curl -o wozz-audit.sh https://wozz.io/audit.sh
    cat wozz-audit.sh
    bash wozz-audit.sh
That definitely doesn't work on `curl 8.7.1 (x86_64-apple-darwin25.0) libcurl/8.7.1 (SecureTransport)` because you return a 307 with 'Redirecting...' in the body so `wozz-audit.sh` is just the string 'Redirecting...'. I have to pass curl the `-L` flag to follow the redirect. Then everything works.
xav0989•1h ago
Curl always requires -L to follow redirects.
seabrookmx•1h ago
Neat. Might have to try this on .NET, especially since v10 just got a new (default) GC that claims to use a lot less memory at idle.
BatteryMountain•1h ago
Its not just idle, it is a bit more aggressive on cleaning up sooner after objects are de-referenced, meaning the OS gets the memory back sooner. Very useful in situation where you have many small containers/vm's running dotnet stuff, or on large applications where memory pressure is an issue and your usage pattern can benefit from memory releasing earlier.

In the old days you could tune IIS + Garbage collector manually to get similar behaviour but was usually not worth it. Time was better spent elsewhere to optimize other things in the pipe and live with the GC freezes. I suspect GC hickups should now be much smaller/shorter too, as the overall load will be lower with new GC.

dev_l1x_be•1h ago
How much is the k8s waste?
dantillberg•1h ago
This has to be malware in poor disguise.

Curl-bash without hash-checking from a four-month-old domain with full k8s cluster access? All the identities connected are brand new, including the brand-new HN account that posted this. There are four commits in the repo, and three are back-dated to exactly 1 year ago.

unsnap_biceps•43m ago
And it's possible to serve different content if the curl is being piped to a shell vs if it's being piped to a file or stdout

https://github.com/Stijn-K/curlbash_detect

wozzio•18m ago
The curl | bash is just for convenience; the README explicitly advises to Download and inspect wozz.sh first if you aren't comfortable piping to shell.

As for the newness I just open-sourced this from my personal scripts collection this week, so yes, the Org and Account are new. It runs entirely locally using your active kubeconfig it doesn't collect credentials or send secrets anywhere. You can cat the script to verify that it's just a wrapper around kubectl top and kubectl get.

nyrikki•1h ago
IMHO there are some serious problems here that won't relate to many situations, and is not really "waste" in the way claimed and will actually probably result in greater spends.

> Memory waste: request - actual usage [0]

Memory "requests" are hints to the kube-scheduler for placement, not a target for expected usage.

> # Memory over-provisioning: limit > 2x request [1]

Memory limits are for enforcement, typically when to call the OOM killer

Niether placement nor oomkilling limits should have anything to do with normal operating parameters.

> The memory request is mainly used during (Kubernetes) Pod scheduling. On a node that uses cgroups v2, the container runtime might use the memory request as a hint to set memory.min and memory.lo [2]

By choosing to label the delta between these two as "waste" you will absolutely suffer from Goodhart's law and you will teach your dev team to not just request, but allocate memory and don't free it so that they can fit inside this invalid metric's assumptions.

It is going to work against the more reasonable goals of having developers set their limits as low as possible without negative effects, while also protecting the node and pod from memory leaks, while still gaining the advantages of over-provisioning, which is where the big gains are to be made.

[0] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [1] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [2] https://kubernetes.io/docs/concepts/configuration/manage-res...

wozzio•13m ago
you are technically right that requests are scheduling hints but in a cluster autoscaler world, requests=bill.

If I request 8GB for a pod that uses 1GB, the autoscaler spins up nodes to accommodate that 8GB reservation. That 7GB gap is capacity the company is paying for but cannot use for other workloads.

Valid point on Goodhart's Law, tho the goal shouldn't be fill the RAM, but rather lower the request to match the working set so we can bin-pack tighter.

linuxftw•35m ago
Let's do the math:

A 64GB instance in Azure is $156.28/month reserved. That's $2.44/GB/month.

Let's say you use an extra 4GB RAM for a safety margin or laziness, and you have 50 replicas. That's $488.375 per month 'wasted'. Or $5860/year.

You'll never recoup the money it takes to get those replicas perfectly sized. Just give every app 4GB more RAM than you think it needs, and move on with your life.

Show HN: I made a spreadsheet where formulas also update backwards

https://victorpoughon.github.io/bidicalc/
220•fouronnes3•2d ago•102 comments

Show HN: Tiny VM sandbox in C with apps in Rust, C and Zig

https://github.com/ringtailsoftware/uvm32
182•trj•23h ago•11 comments

Show HN: LinkedQL – Live Queries over Postgres, MySQL, MariaDB

https://github.com/linked-db/linked-ql
20•phrasecode•5d ago•7 comments

Show HN: WineBar: A yet another Wine prefix manager, with Asahi Linux support

https://github.com/Tulon/WineBar
2•JosifA•4h ago•0 comments

Show HN: Tripwire: A new anti evil maid defense

https://github.com/fr33-sh/Tripwire
78•DoctorFreeman•2d ago•47 comments

Show HN: Local Privacy Firewall-blocks PII and secrets before ChatGPT sees them

https://github.com/privacyshield-ai/privacy-firewall
110•arnabkarsarkar•4d ago•54 comments

Show HN: A real-time 4D fractal explorer in the browser using WebGPU

https://bryanjj.github.io/nebula/
10•bryan0•1d ago•5 comments

Show HN: Autofix Bot – Hybrid static analysis and AI code review agent

35•sanketsaurav•1d ago•13 comments

Show HN: Sim – Apache-2.0 n8n alternative

https://github.com/simstudioai/sim
233•waleedlatif1•2d ago•59 comments

Show HN: Hands on tutorial for open source contribution

https://github.com/firstcontributions/first-contributions
3•promptmike•11h ago•0 comments

Show HN: Gemini Pro 3 imagines the HN front page 10 years from now

https://dosaygo-studio.github.io/hn-front-page-2035/news
3323•keepamovin•4d ago•960 comments

Show HN: Wirebrowser – A JavaScript debugger with breakpoint-driven heap search

https://github.com/fcavallarin/wirebrowser
68•fcavallarin•3d ago•15 comments

Show HN: A 2-row, 16-key keyboard designed for smartphones

https://k-keyboard.com/Why-QWERTY-mini
81•QWERTYmini•3d ago•68 comments

Show HN: Automated license plate reader coverage in the USA

https://alpranalysis.com
238•sodality2•3d ago•146 comments

Show HN: Jottings; Anti-social microblog for your thoughts

https://jottings.me/
25•vishalvshekkar•1d ago•15 comments

Show HN: Browser4 – an open-source browser engine for agents and concurrency

https://github.com/platonai/Browser4
6•galaxyeye•15h ago•2 comments

Show HN: AlgoDrill – Interactive drills to stop forgetting LeetCode patterns

https://algodrill.io
177•henwfan•4d ago•107 comments

Show HN: I built a system for active note-taking in regular meetings like 1-1s

https://withdocket.com
175•davnicwil•4d ago•132 comments

Show HN: Ten Principles of Good Design

https://tonygaeta.com/labs/ten-principles-of-good-design
4•LightMorpheus•17h ago•0 comments

Show HN: GPULlama3.java Llama Compilied to PTX/OpenCL Now Integrated in Quarkus

24•mikepapadim•2d ago•5 comments

Show HN: Gotui – a modern Go terminal dashboard library

https://github.com/metaspartan/gotui
44•carsenk•2d ago•13 comments

Show HN: An endless scrolling word search game

https://endless-wordsearch.com
25•marcusdev•2d ago•16 comments

Show HN: Epstein's emails reconstructed in a message-style UI (OCR and LLMs)

https://github.com/Toon-nooT/epsteins-phone-reconstructed
44•toon-noot•1d ago•8 comments

Show HN: An ASCII table that doesn't hurt your eyes

https://asciify.dev/
5•dklepenko•20h ago•1 comments

Show HN: EdgeVec – Sub-millisecond vector search in the browser (Rust/WASM)

https://github.com/matte1782/edgevec
6•matteo1782•22h ago•1 comments

Show HN: PharmVault – Secure Notes with Spring Boot and JWT

https://github.com/nifski/PharmVault
4•nifemi1234•22h ago•3 comments

Show HN: I built a GitHub application that generates documentation automatically

https://codesummary.io
5•jerrodcodes•23h ago•3 comments

Show HN: Verani – Socket.io-like realtime SDK for Cloudflare

https://github.com/v0id-user/verani
6•v0id_user•1d ago•0 comments

Show HN: PhenixCode – Added admin dashboard for multi-server management

https://github.com/nesall/phenixcode
3•nesall•1d ago•0 comments

Show HN: Storyloom – Deterministic Storytelling Framework

https://jcpsimmons.github.io/storyloom/
4•joshcsimmons•1d ago•0 comments