frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Verifying your Matrix devices is becoming mandatory

https://element.io/blog/verifying-your-devices-is-becoming-mandatory-2/
64•LorenDB•2h ago•40 comments

Jailbreaking AI Models to Phish Elderly Victims

https://simonlermen.substack.com/p/can-ai-models-be-jailbroken-to-phish
54•DalasNoin•2h ago•19 comments

Blame as a Service

https://www.humaninvariant.com/blog/blame
54•humaninvariant•1w ago•2 comments

Loose wire leads to blackout, contact with Francis Scott Key bridge

https://www.ntsb.gov:443/news/press-releases/Pages/NR20251118.aspx
247•DamnInteresting•6h ago•101 comments

Researchers discover security vulnerability in WhatsApp

https://www.univie.ac.at/en/news/detail/forscherinnen-entdecken-grosse-sicherheitsluecke-in-whatsapp
143•KingNoLimit•6h ago•44 comments

Crypto got everything it wanted. Now it's sinking

https://www.economist.com/finance-and-economics/2025/11/18/crypto-got-everything-it-wanted-now-it...
14•pseudolus•1h ago•21 comments

Workday to acquire Pipedream

https://newsroom.workday.com/2025-11-19-Workday-Signs-Definitive-Agreement-to-Acquire-Pipedream
38•gaws•2h ago•27 comments

Europe is scaling back GDPR and relaxing AI laws

https://www.theverge.com/news/823750/european-union-ai-act-gdpr-changes
543•ksec•12h ago•562 comments

Meta Segment Anything Model 3

https://ai.meta.com/sam3/
288•lukeinator42•9h ago•58 comments

Building more with GPT-5.1-Codex-Max

https://openai.com/index/gpt-5-1-codex-max/
344•hansonw•8h ago•199 comments

Precise geolocation via Wi-Fi Positioning System

https://www.amoses.dev/blog/wifi-location/
110•nicosalm•5h ago•59 comments

Microsoft AI CEO pushes back against critics after recent Windows AI backlash

https://www.windowscentral.com/microsoft/windows-11/microsoft-ai-ceo-pushes-back-against-critics-...
103•thewebguyd•6h ago•109 comments

How Slide Rules Work

https://amenzwa.github.io/stem/ComputingHistory/HowSlideRulesWork/
56•ColinWright•5h ago•16 comments

What really happened with the CIA and The Paris Review?

https://www.theparisreview.org/blog/2025/11/11/what-really-happened-with-the-cia-and-the-paris-re...
12•frenzcan•1w ago•1 comments

AI is a front for consolidation of resources and power

https://www.chrbutler.com/what-ai-is-really-for
125•delaugust•7h ago•93 comments

Gaming on Linux has never been more approachable

https://www.theverge.com/tech/823337/switching-linux-gaming-desktop-cachyos
248•throwaway270925•5h ago•190 comments

Larry Summers resigns from OpenAI board

https://www.cnbc.com/2025/11/19/larry-summers-epstein-openai.html
271•koolba•13h ago•268 comments

Thunderbird adds native Microsoft Exchange email support

https://blog.thunderbird.net/2025/11/thunderbird-adds-native-microsoft-exchange-email-support/
339•babolivier•15h ago•100 comments

Static Web Hosting on the Intel N150: FreeBSD, SmartOS, NetBSD, OpenBSD and Linu

https://it-notes.dragas.net/2025/11/19/static-web-hosting-intel-n150-freebsd-smartos-netbsd-openb...
127•t-3•9h ago•43 comments

The patent office is about to make bad patents untouchable

https://www.eff.org/deeplinks/2025/11/patent-office-about-make-bad-patents-untouchable
270•iamnothere•5h ago•25 comments

Why CUDA translation wont unlock AMD

https://eliovp.com/why-cuda-translation-wont-unlock-amds-real-potential/
64•JonChesterfield•1w ago•47 comments

Racing karts on a Rust GPU kernel driver

https://www.collabora.com/news-and-blog/news-and-events/racing-karts-on-a-rust-gpu-kernel-driver....
48•mfilion•6h ago•3 comments

Vortex: An extensible, state of the art columnar file format

https://github.com/vortex-data/vortex
40•tanelpoder•5d ago•7 comments

Launch HN: Mosaic (YC W25) – Agentic Video Editing

https://mosaic.so
108•adishj•11h ago•102 comments

Cognitive and mental health correlates of short-form video use

https://psycnet.apa.org/fulltext/2026-89350-001.html
225•smartmic•7h ago•163 comments

Pozsar's Bretton Woods III: The Framework

https://philippdubach.com/2025/10/25/pozsars-bretton-woods-iii-the-framework-1/2/
48•7777777phil•7h ago•22 comments

Branching with or Without PII: The Future of Environments

https://neon.com/blog/branching-environments-anonymized-pii
11•emschwartz•1w ago•3 comments

The Death of Arduino?

https://www.linkedin.com/posts/adafruit_opensource-privacy-techpolicy-activity-739690336223705497...
378•ChuckMcM•7h ago•190 comments

Measuring political bias in Claude

https://www.anthropic.com/news/political-even-handedness
37•gmays•7h ago•59 comments

A $1k AWS mistake

https://www.geocod.io/code-and-coordinates/2025-11-18-the-1000-aws-mistake/
281•thecodemonkey•17h ago•241 comments
Open in hackernews

Building Burstables: CPU slicing with cgroups

https://www.ubicloud.com/blog/building-burstables-cpu-slicing-with-cgroups
130•msarnowicz•6mo ago

Comments

msarnowicz•6mo ago
Hey, author here. Please AMA.

I came into the Linux world via Postgres, and this was an interesting project for me learning more about Linux internals. While cgroups v2 do offer basic support for CPU bursting, the bursts are short-lived, and credits don’t persist beyond sub-second intervals. If you’ve run into scenarios where more adaptive or sustained bursting would help, we’d love to hear about them. Knowing your use cases will help shape what we build next.

parrit•6mo ago
Thanks! That was a pleasant read. I have been wanting to mess with cgroups for a while, in order to hack together a "docker" like many have done before to understand it better. This will help!

Are there typical use cases where you reach for cgroups directly instead of using the container abstraction?

msarnowicz•6mo ago
Thanks for the kind words. Even if you are not building a cloud service, I think it is good to understand how the underlying layer works and what are the knobs and the limits of the platform. I could see a use case where two or more processes need to run on one VM or a container, maybe for cost-saving reasons or specific architecture/security reasons, but need to be guaranteed a certain amount of resources and a certain isolation from each other.
motrm•6mo ago
Echoing parrit's comment, this was indeed a very nice read and very well written.

I particularly enjoyed the gentle exposition into the world of cgroups and how they work, the levers available, and finally how Ubicloud uses them.

Looking forward to reading how you handle burst credits over longer periods, once you implement that :)

Lovely work, Maciek!

msarnowicz•6mo ago
Thank you very much, I appreciate your comment.
nighthawk454•6mo ago
Great article, thanks! I’ve been curious if there’s any scheduling optimizations for workloads that are extremely burst-y. Such as super low traffic websites or cron job type work - where you may want your database ‘provisioned’ all the time, but realistically it won’t get anywhere near even the 50% cpu minimum at any kind of sustained rate. Presumably those could be hosted at even a fraction of the burst cost. Is that a use case Ubicloud has considered?
msarnowicz•6mo ago
This is a very valid scenario, however, one that is not yet fully baked into this implementation. But, as mentioned, this is a starting point. We want to hear feedback and see customers' workloads on Burstables first.

The main challenge here is that cpu.max.burst can be set no higher than the limit set in cpu.max. This limits our options to some extent. But we can still look at some possible implementation choices here: - Pack more VMs into the same slice/group, and with that lower the minimum CPU guaranteed, and at the same time lower the price point. This would increase the chance of running into a "noisy neighbor", but we expect it would not be used for any critical workload. - Implement calculation of CPU credits outside of the kernel and change the CPU max and burst limits dynamically over an extended period of time (hours and days, instead of sub-second).

nighthawk454•6mo ago
Gotcha, thanks for the reply. Makes sense to target burstables first - that seems to be the most common feature set. That’s interesting that it’s not readily available in the kernel. I once spoke to some AWS folks dealing with Batch/ECS scheduling of docker container tasks and they hit similar limitations. As a result their CPU max/burst settings work like the underlying cgroups too.

I imagine writing a custom scheduler would be quite an undertaking!

msarnowicz•6mo ago
I think so, too!
phrotoma•6mo ago
I don't have a question but I really wanted to say thanks for the blog post. Extremely clear and cogent writing on a tricky topic. Well done!
jauntywundrkind•6mo ago
I'd also strongly recommend this view of how Kubernetes uses cgroups, showing similar drill downs for how everything gets managed. Lovely view of what's really happening! https://martinheinz.dev/blog/91

I've been a bit apoplectic in the past that cgroups seemed not super helpful in Kubernetes, but this really showed me how the different Kubernetes QoS levels are driven by similar juggling of different cgroups.

I'm not sure if this makes use of cpu.max.burst or not. There's a fun article that monkeys with these cgroups directly, which is neat to see. It also links to an ask that Kubernetes get support for the new (5.14) CFS Burst system. Which is a whole nother fun rabbit hole of fair share bursting to go down! https://medium.com/@christian.cadieux/kubernetes-throttling-... https://github.com/kubernetes/kubernetes/issues/104516

msarnowicz•6mo ago
Thank you, that is a good perspective, too!
__turbobrew__•6mo ago
cpu.max.burst increases the chances of noisy neighbours stealing CPU from other tenants.

I run multi-tenant k8s clusters with hundreds of tenants and it fundamentally is a hard problem to balance workload performance with efficiency. Sharing resources increases efficiency but in most cases increases tail latencies.

jeffbee•6mo ago
If you use k8s qos levels "guaranteed" cpu resources will be distinct — via cpu sets — from the ones used by the riff-raff. This is a good way to segregate latency-sensitive apps where you care about latency from throughtput-oriented stuff where you don't.
__turbobrew__•6mo ago
Guaranteed QoS isn’t perfect:

1. Neighbours can be noisy to the other hyperthread on the same CPU. For example, heavy usage of avx-512 and other vectorized instructions can affect a tenant running on the same core but different hyperthread. You can disable hyperthreading, but now you are making the same tradeoff where you are sacrificing efficiency for low tail latencies.

2. There are certain locks in the kernel which can be exhausted by certain behaviour of a single tenant. For example, on kernel 5.15 there was one global kernel lock for cgroup resource accounting. If you have a tenant which is constantly hitting cgroup limits it increases lock contention in the kernel which slows down other tenants on the system which also use the same locks. This particular issue with cgroups accounting has been improved in later kernels.

3. If your latency sensitive service runs on the same cores which service IRQs, the tail latency can greatly increase when there are heavy IRQ load, for example high speed NIC IRQs. You can isolate those CPUs from the pool of CPUs offered to pods, but now you are dedicating 4-8 CPUs to just process interrupts. Ideally you could run the non-guaranteed pods on the CPUs which service IRQs, but that is not supported by kubernetes.

4. During full node memory pressure, the kernel does not respect memory.min and will reclaim pages of guaranteed QoS workloads.

5. The current implementation of memory QoS does not adjust memory.max of the burstable pod slice, so bursable pods can take up the entire free memory of the kubepods slice which starves new memory allocations from guaranteed pods.

Dont even get me started on NUMA issues.

jeffbee•6mo ago
There isn't any way on Linux to deal with processes that create dirty pages. It is folly to try. The only way to deal is to put I/O stuff on a whole box/node by itself, and outlaw block I/O on all other nodes.
hinkley•6mo ago
I suspect you can only really count on neighbors to take care of their own. Anything else they see will be taken as an entitlement.

So for instance if you run three processes for the same customer, can you set them to use the same cpu slices and deal with one of their apps occasionally needing a burst of CPU?

__turbobrew__•6mo ago
Sure in theory you could do that, but kubernetes does not support overriding the top level cgroup a pod is assigned to.
immibis•6mo ago
Can't find the article where I first read it (something like "Queuing theory for software engineers") but average latency increases as, IIRC, serving time ÷ (1 - utilization). Get half as close to 100% utilization, and you double your average latency. A system with 87.5% utilization has double the latency as at 75%. At 100% it's infinity (averaged over infinite time - on shorter timescales it's an unpredictable scale-free random walk).

This is fundamental - the closer utilization is to 100%, the higher the chance a newly arriving work item has to wait for one that's already running, and several already in the queue. What's astonishing is how steep that curve is. At 95% utilization the average queue length is 20 tasks. At 99% it's 100 tasks. At 99.9% it's 1000 asks. If you find yourself at 98% utilization, you should not think "nice - in fully utilizing the server I paid for" - you should buy another server and lower it to 49%. (Or optimize the code more)

One way to deal with this is to have separate low-latency and high-latency queues. You can then run low latency tasks at say 50% utilization and fill up idle time with high latency tasks. Presuming and you actually want the HL tasks to ever get done, you can't guarantee 100% utilization, but you can get arbitrarily close as long as there's high-latency work to do. I have no idea whether this is something Kubernetes can do. You can of course have more than two priority levels.

This applies everywhere there's a queue, which is basically everywhere there's s contended resource. Hyperscalers know this. It's even been theorized that S3 Glacier is just the super low priority disk access queue on regular AWS servers (but Amazon won't tell us).

remram•6mo ago
Maybe one of these? https://dzone.com/articles/queuing-theory-for-software-engin... https://medium.com/@quebostina/stack-and-queue-are-two-of-th...
msarnowicz•6mo ago
Reading through the description of how cgroups are used in Kubernetes, I can see some similarities and some differences as well. It is interesting to compare the approaches.

We chose not to use cpu.weight, and instead divide the host explicitly using cgroups (slice in systemd). We put Standard VMs in dedicated slices to keep them isolated and let several Burstable VMs share a slice. This provides a trade off between the price of the VM and resource guarantees.

We use cpu.max.burst to allow the VMs to "expand" a bit, while we understand that this creates a "noisy neighbor" problem. At the same time there is a minimum guarantee of the CPU. The cgroups allow for all those knobs and give a lot of control. Combining them in various ways is an interesting puzzle.

solarkraft•6mo ago
My main takeaway from this is that you can control KVM VMs with cgroups just like normal processes. I didn’t expect that.
msarnowicz•6mo ago
I am glad you found this useful!