You want to compare the _maximum_ memory under load - the high-water mark. If you combine Wozz with a benchmark to drive heavy traffic, you could more confidently say any unused memory was truly wasted.
Note: the tool calls out to Microsoft but the docs don't say what data is shared ("telemetry"), better make sure your firewalls are functioning before running this tool.
Not sure what the source license story is.
Yeah defaults are the enemy here. Most of the waste I'm seeing in the data comes from generic Spring Boot apps running with out of the box settings where the JVM assumes it owns the entire node.
curl -o wozz-audit.sh https://wozz.io/audit.sh
cat wozz-audit.sh
bash wozz-audit.sh
That definitely doesn't work on `curl 8.7.1 (x86_64-apple-darwin25.0) libcurl/8.7.1 (SecureTransport)` because you return a 307 with 'Redirecting...' in the body so `wozz-audit.sh` is just the string 'Redirecting...'. I have to pass curl the `-L` flag to follow the redirect. Then everything works.In the old days you could tune IIS + Garbage collector manually to get similar behaviour but was usually not worth it. Time was better spent elsewhere to optimize other things in the pipe and live with the GC freezes. I suspect GC hickups should now be much smaller/shorter too, as the overall load will be lower with new GC.
Curl-bash without hash-checking from a four-month-old domain with full k8s cluster access? All the identities connected are brand new, including the brand-new HN account that posted this. There are four commits in the repo, and three are back-dated to exactly 1 year ago.
As for the newness I just open-sourced this from my personal scripts collection this week, so yes, the Org and Account are new. It runs entirely locally using your active kubeconfig it doesn't collect credentials or send secrets anywhere. You can cat the script to verify that it's just a wrapper around kubectl top and kubectl get.
> Memory waste: request - actual usage [0]
Memory "requests" are hints to the kube-scheduler for placement, not a target for expected usage.
> # Memory over-provisioning: limit > 2x request [1]
Memory limits are for enforcement, typically when to call the OOM killer
Niether placement nor oomkilling limits should have anything to do with normal operating parameters.
> The memory request is mainly used during (Kubernetes) Pod scheduling. On a node that uses cgroups v2, the container runtime might use the memory request as a hint to set memory.min and memory.lo [2]
By choosing to label the delta between these two as "waste" you will absolutely suffer from Goodhart's law and you will teach your dev team to not just request, but allocate memory and don't free it so that they can fit inside this invalid metric's assumptions.
It is going to work against the more reasonable goals of having developers set their limits as low as possible without negative effects, while also protecting the node and pod from memory leaks, while still gaining the advantages of over-provisioning, which is where the big gains are to be made.
[0] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [1] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [2] https://kubernetes.io/docs/concepts/configuration/manage-res...
If I request 8GB for a pod that uses 1GB, the autoscaler spins up nodes to accommodate that 8GB reservation. That 7GB gap is capacity the company is paying for but cannot use for other workloads.
Valid point on Goodhart's Law, tho the goal shouldn't be fill the RAM, but rather lower the request to match the working set so we can bin-pack tighter.
A 64GB instance in Azure is $156.28/month reserved. That's $2.44/GB/month.
Let's say you use an extra 4GB RAM for a safety margin or laziness, and you have 50 replicas. That's $488.375 per month 'wasted'. Or $5860/year.
You'll never recoup the money it takes to get those replicas perfectly sized. Just give every app 4GB more RAM than you think it needs, and move on with your life.
wozzio•5h ago
I wrote a simple CLI tool (bash wrapper around kubectl) to automate diffing kubectl top metrics against the declared requests in the deployment YAML.
I ran it across ~500 pods in production. The "waste" (allocated vs. used) average by language was interesting:
Python: ~60% waste (Mostly sized for startup spikes, then idles empty).
Java: ~48% waste (Devs seem terrified to give the JVM less than 4Gi).
Go: ~18% waste.
The tool is called Wozz. It runs locally, installs no agents, and just uses your current kubecontext to find the gap between what you pay for (Requests) and what you use (Usage).
It's open source. Feedback welcome.
(Note: The install is curl | bash for convenience, but the script is readable if you want to audit it first).
rvz•5h ago
Correct.
Developers keep over-provisioning as they need enough memory for the app to continue running as demand scales up. Since these languages have their own runtimes and GCs to manage memory, it already pre-allocates lots of RAM before running the app; adding to the bloat.
Part of the problem is not only a technical one (the language may be bloated and inefficient) but it is completely psychological as developers are scared of their app getting an out-of-memory exception in production.
As you can see, the languages with the most waste are the ones that are inefficient both runtime (speed) and space (memory) complexity and take up the most memory and are slower (Python, and Java) and costs a lot of money to continue maintaining them.
I got downvoted over questioning the microservice cargo cult [0] with Java being the darling of that cult. If you imagine a K8s cluster with any of these runtimes, you can see which one will cost the most as you scale up with demand + provisioning.
Languages like Go, and Rust are the clear winners if you want to save lots of money and are looking for efficiency.
[0] https://news.ycombinator.com/item?id=44950060
pestatije•4h ago
dboreham•2h ago
PaulKeeble•1h ago
The company wasting 20% extra memory on the other hand is still selling and copes with the slower transaction speed just fine.
Not sure over provisioning memory is really just waste when we have dynamic memory based languages, which is all modern languages not in real time safety critical environments.
gopher_space•1h ago
I'm thinking more like getting a junior to do efficiency passes on a year's worth of data.
wozzio•24m ago
cogman10•1h ago
Java will happily eat all the memory you throw at it. The fact that it isn't means they are probably relying on the default JVM settings. Those are too conservative inside a container, especially if they are running on older JVM versions.
What you'll find is the JVM will constantly have "waste" due to it's default configuration. The question is if it ends up throwing out of memory exceptions due to the lower memory limit.
If your devs haven't looked at the heap allocation of a running application, there's no way to know if this is too much or too little memory.
Go/python much more readily give memory back to the OS.
wozzio•26m ago
karolinepauls•1h ago
I understand we're talking about CPU in case of Python and memory for Java and Go. While anxious overprovisioning of memory is understandable, doing the same for CPU probably means lack of understanding of the difference between CPU limits and CPU requests.
Since I've been out of DevOps for a few years, is there ever a reason not to give each container the ability to spike up to 100% of 1 core? Scheduling of mass container startup should be a solved problem by now.
cogman10•1h ago
Your limit should roughly be "what should this application use if it goes full bore" and your request should be "what does this use at steady state".
At least at my company, the cluster barely uses any CPU even during the busiest hours. We are fairly over provisioned there because a lot of devs are keeping limit and request the same.
Narishma•1h ago
wozzio•25m ago
That's why Python numbers look so bad here devs set the request high enough to cover that initial model loading spike so they don't crash during a rollout, even if they idle at 10% usage afterwards.
Marazan•1h ago
wozzio•23m ago
stefan_•1h ago
This is truly one of the dumbest outcomes of the whole "resume driven cloud deployment" era. Clouds "market segmenting" with laughable memory, resume engineers want their line but not the recurring cost, engineers waste weeks investigating and working around out of memory issues that are purely some guy provisioning services with 2003 levels of memory. It's all a giant waste of time.