Preparing for the .NET 10 GC

https://maoni0.medium.com/preparing-for-the-net-10-gc-88718b261ef2

57•benaadams•5h ago

Comments

orphea•3h ago

For those who like me was left wondering what DATAS is, here is the link:

https://learn.microsoft.com/en-us/dotnet/standard/garbage-co...

gwbas1c•3h ago

Yeah, I kept scrolling to the top to see if I overlooked something.

Then I realized, "oh, it's hosted on Medium." (I generally find Medium posts to be very low quality.) In this case, the author implies that they are on the .Net team, so I'm continuing to read.

(At least I hope the author actually is on the .Net team and isn't blowing hot air, because it's a Medium post and not something from an official MS blog.)

olidb•2h ago

Maoni Stephens is indeed on the .net team and is, as far as I know, the lead architect of the .net garbabe collector for many years: https://github.com/Maoni0 https://devblogs.microsoft.com/dotnet/author/maoni/

Therefore she's probably the person with the most knowledge about the .net GC but maybe not the best writer (I haven't read the article yet).

moomin•1h ago

The writing itself is fine, but she’s assuming a LOT of knowledge e.g. what a GC0 budget is and what increasing it means.

bob1029•3h ago

> Maximum throughput (measured in RPS) shows a 2-3% reduction, but with a working set improvement of over 80%.

I have a hard time finding this approach compelling. The amount of additional GC required in their example seems extreme to me.

bilekas•3h ago

It's incredibly frustrating the author doesn't actually say "Garbage Collector (GC)" I'm aware but something niggling in the back of my head had me second guessing.

nu11ptr•3h ago

Even worse: they don't explain what the DATAS acronym means. Seems like the author makes too many assumptions about the knowledge base of their reader IMO.

stonemetal12•2h ago

I am guessing he doesn't expect linking from outside. The blog post before this one starts: "In this blog post I’m going to talk about our .NET 8.0 GC feature called DATAS (Dynamic Adaptation To Application Sizes)."

Akronymus•10m ago

Because everyone knows at least the formulas for quartz, of course

https://xkcd.com/2501/

gwbas1c•3h ago

This post would carry a lot more authority if it was on an official MS or .net blog; instead of Medium. (I typically associate Medium with low-quality blog entries and don't read them.)

justin66•3h ago

Or if the author used their real name.

giancarlostoro•2h ago

Agree. It's not like a blogpost that is about grey hat subjects or something.

artimaeis•2h ago

For what it's worth, Maoni is the author's real name. Maoni0 is what they go by everywhere. You can find interviews and plenty of their other content if you search around a bit.

justin66•2h ago

Using a handle instead of their full name on an article is a choice. The first impression is not “knowledgeable employee making post about company’s product.”

Posting from a Microsoft blog would to some extent fix this, to the OP’s point.

(I know - who cares. But first impressions are what they are)

nu11ptr•2h ago

I don't generally find them low quality, but I do wish people wouldn't use it since I don't subscribe to it.

giancarlostoro•2h ago

Its the Pinterest of blogs, its really annoying.

giancarlostoro•2h ago

Moreso if the authors profile picture wasn't what looks like a memecat. I can't exactly share this around without feeling like they'll judge it based on that alone.

pestkranker•2h ago

Maoni0 is the mastermind behind the .NET GC. They won't judge you, and if they do, that's their problem.

pjmlp•2h ago

The author is one of the main GC architects on .NET, so we in the known are aware of who she is.

Here is an interview with her,

https://www.youtube.com/watch?v=ujkSnko0JNQ

Having said this, I agree with you, the Aspire/MAUI architects do the same, I really don't get why we have to search for this kind of blog posts on other platforms instead of DevBlogs.

deburo•2h ago

They should probably cross-post in both.

gwbas1c•2h ago

One anecdote from working with .Net for over 20 years: I've had a few situations where someone (who isn't a programmer and/or doesn't work with .Net) insists that the application has a memory leak.

First, I explain that garbage collected applications don't release memory immediately. Then I get sucked into a wild goose chase looking for a memory leak that doesn't exist. Finally, I point out that the behavior they see is normal, usually to some grumbling.

From what I can tell, DATAS basically makes a .Net application have a normal memory footprint. Otherwise, .Net is quite a pig when it comes to memory. https://github.com/GWBasic/soft_matrix, implemented in Rust, generally has very low memory consumption. An earlier version that I wrote in C# would consume gigabytes of memory (and often run out of memory when run on Mono with the Bohem garbage collector.)

---

> If startup perf is critical, DATAS is not for you

This is one of my big frustrations with .net, (although I tend to look at how dependency injection is implemented as a bigger culprit.)

It does make me wonder: How practical is it to just use traditional reference counting and then periodically do a mark-and-sweep? I know it's a very different approach than .net was designed for. (Because they deliberately decided that dereferencing an object should have no computational cost.) It's more of a rhetorical question.

nu11ptr•2h ago

> It does make me wonder: How practical is it to just use traditional reference counting and then periodically do a mark-and-sweep? I know it's a very different approach than .net was designed for. (Because they deliberately decided that dereferencing an object should have no computational cost.) It's more of a rhetorical question.

This is what CPython does. The trade off is solidly worse allocator performance, however. You also have the reference counting overhead, which is not trivial unless it is deferred.

There is always a connection between the allocator and collector. If you use a compacting collector (which I assumed .NET does), you get bump pointer allocation, which is very fast. However, if you use a non-compacting collector (mark-and-sweep is non-compacting), you would then fallback to a normal free list allocator (aka as "malloc") which has solidly higher overhead. You can see the impact of this (and reference counting) in any benchmark that builds a tree (and therefore is highly contended on allocation). This is also why languages that use free list allocation often have some sort of "arena" library, so they can have high speed bump pointer allocation in hot spots (and then free all that memory at once later on).

BTW, reference counting, malloc/free performance also impact Rust, but given Rust's heavy reliance on the stack it often doesn't impact performance much (aka just doing less allocations). For allocation heavy code, many of us use MiMalloc one of the better malloc/free implementations.

gwbas1c•2h ago

So basically you're trading lowering RAM consumption for higher CPU consumption?

FWIW: When I look at Azure costs, RAM tends to cost more than CPU. So the tradeoffs of using a "slower" memory manager might be justified.

nu11ptr•2h ago

It depends on workload. It is difficult to quantify the trade offs without knowing that.

The problem is in languages like C#/Java almost everything is an allocation, so I don't really think reference counting would work well there. I suspect this is the reason PyPy doesn't use reference counting, it is a big slowdown for CPython. Reference counting really only works well in languages with low allocations. Go mostly gets away with a non-compacting mark-sweep collector because it has low level control that allows many things to sit on the stack (like Rust/C/C++, etc.).

adgjlsfhk1•1h ago

C# is a lot better than Java on this front since they support stack allocated structs

whaleofatw2022•2h ago

Dotnet does both mark and sweep as well as compaction, depends on what type of GC happens.

gwbas1c•1h ago

In this case, we're discussing a case where mark-and-sweep is used to collect cyclic references, and it's implied that there are no generations. (Because otherwise, purely relying on reference counting means that cyclic references end up leaking unless things like weak references are used.)

IE, the critical difference is that reference counting frees memory immediately; albeit at a higher CPU cost and needing to still perform a mark-and-sweep to clear out cyclic references.

SideburnsOfDoom•27m ago

> First, I explain that garbage collected applications don't release memory immediately. ... I point out that the behavior they see is normal

yes, this is an easily overlooked point: Using memory when it going free is by design. It is often better to use use up cheap, unused memory instead of expensive CPU doing a GC. When memory is plentiful as it often is, then it is faster to just not run a GC yet.

You're not in trouble unless you run short of memory, and a necessary GC does not free up enough. Then only can you call it an issue.

kg•14m ago

One of the main problems with refcounting is that unless your compiler/JIT are able to safely, aggressively optimize out reference increment/decrements, you can spend a ton of CPU time pointlessly bumping a counter up and down every time you enter a new function/method. This has been a problem for ObjC and Swift applications in the past AFAIK, though both of those compilers do a great job of optimizing that stuff out where possible.

There are some other things that would probably be improvements coming along with refcounting though - you might be able to get rid of GC write barriers.

jcmontx•2h ago

But don't you take a hit in performance by running the GC more often?

NetMageSCW•2h ago

Not necessarily if you have more (so smaller) heaps so each GC takes less time.

stonemetal12•2h ago

Maybe, maybe not. If GC is a O(n^2) then running it twice at n=5 is a much shorter run time than once at n=10.

daxfohl•2h ago

Maybe I missed it, but is there a shadow mode to estimate the memory and perf impact without actually enabling the feature? Or better yet, a way to analyze existing dotnet 8 GC logs to understand the approx impact?

graycat•4m ago

For the author, some definitions:

GC? -- Maybe "Garbage Collection", i.e., have some memory (mainly computer main memory) allocated, don't need it (just now or forever), and want to release it, i.e., no longer have it allocated for its original purpose. By releasing can make it available for other purposes, software threads, programs, virtual machines, etc.

DATAS -- Not a spelling error or about any usual meaning for data and instead as in

https://learn.microsoft.com/en-us/dotnet/standard/garbage-co...

for "Dynamic adaptation to application sizes"

So, we're trying to take actions over time in response to some inputs that are in some respects unpredictable.

Okay, what is the objective, i.e., the reason, what we hope to gain, or why bother?

And for the part that is somewhat unpredictable over time, that's one or more stochastic processes (or one multidimensional stochastic process?).

So, in broad terms, we are interested in stochastic optimal control. "Dynamic adaptation", is close and also close to one method, dynamic programming -- in an earlier thread at Hacker News, gave a list of references. Confession, wrote my applied math Ph.D. dissertation in that subject.

Hmm, how to proceed??? Maybe, (A) Know more about the context, e.g., what the computer is doing, what's to be minimized or maximized. (B) Collect some data on the way to knowing more about the stochastic processes involved.

For me, how to get paid? If tried to make a living from applied stochastic optimal control, would have died from starvation. Got the Ph.D. JUST to be better prepared as an employee for such problems and had to learn that NO one, not even one in the galaxy, cares as much as one photon of ~1 Hz light.

So, am starting a business heavily in computing and applied math. The code from Microsoft tools is all in .NET, ASP.NET, ADO.NET, etc. Code runs fine. The .NET software, via the VB.NET

syntactic sugar*. is GREAT for writing the code.

So, MUST keep up on Microsoft tools, and here just did that. Since .NET 10 is changing some versions of Windows, my reaction is (i) add a lot of main memory until GC is nearly irrelevant, (ii) in general, wait a few years to give Microsoft time to fix problems, i.e., usually be a few years behind the latest versions.

Experience: At one time, saw some server farms big on reliability. One site had two of everything, one for the real work and another to test the latest for bugs before being used for real work. Another had their own electrical power, Diesel generators ~30 feet high, a second site duplicating everything, ~400 miles away, with every site with lots of redundancy. In such contexts, working hard and taking risks trying to save money on main memory seem unwise.

Yt-dlp: Upcoming new requirements for YouTube downloads

That Secret Service SIM farm story is bogus

SedonaDB: A new geospatial DataFrame library written in Rust

US Airlines Push to Strip Away Travelers' Rights by Rolling Back Key Protections

Python on the Edge: Fast, sandboxed, and powered by WebAssembly

Learning Persian with Anki, ChatGPT and YouTube

How to Lead in a Room Full of Experts

Who Funds Misfit Research?

Smartphone Cameras Go Hyperspectral

The Lambda Calculus – Stanford Encyclopedia of Philosophy

EU age verification app not planning desktop support

How to Be a Leader When the Vibes Are Off

How HubSpot Scaled AI Adoption

New bacteria, and two potential antibiotics, discovered in soil

Better Curl Saul: a lightweight API testing CLI focused on UX and simplicity

Zed's Pricing Has Changed: LLM Usage Is Now Token-Based

Rights groups urge UK PM Starmer to abandon plans for mandatory digital ID

S3 scales to petabytes a second on top of slow HDDs

My Ed(1) Toolbox

Preparing for the .NET 10 GC

Just Let Me Select Text

Everyone's trying vectors and graphs for AI memory. We went back to SQL

The DHS has been harvesting DNA from Americans for years

Exploring GrapheneOS secure allocator: Hardened Malloc

The Data Commons Model Context Protocol (MCP) Server

Huntington's disease treated for first time

My game's server is blocked in Spain whenever there's a football match on

Identity Types

I Spent Three Nights Solving Listen Labs Berghain Challenge (and Got #16)

Find SF parking cops

Yt-dlp: Upcoming new requirements for YouTube downloads

That Secret Service SIM farm story is bogus

SedonaDB: A new geospatial DataFrame library written in Rust

US Airlines Push to Strip Away Travelers' Rights by Rolling Back Key Protections

Python on the Edge: Fast, sandboxed, and powered by WebAssembly

Learning Persian with Anki, ChatGPT and YouTube

How to Lead in a Room Full of Experts

Who Funds Misfit Research?

Smartphone Cameras Go Hyperspectral

The Lambda Calculus – Stanford Encyclopedia of Philosophy

EU age verification app not planning desktop support

How to Be a Leader When the Vibes Are Off

How HubSpot Scaled AI Adoption

New bacteria, and two potential antibiotics, discovered in soil

Better Curl Saul: a lightweight API testing CLI focused on UX and simplicity

Zed's Pricing Has Changed: LLM Usage Is Now Token-Based

Rights groups urge UK PM Starmer to abandon plans for mandatory digital ID

S3 scales to petabytes a second on top of slow HDDs

My Ed(1) Toolbox

Preparing for the .NET 10 GC

Just Let Me Select Text

Everyone's trying vectors and graphs for AI memory. We went back to SQL

The DHS has been harvesting DNA from Americans for years

Exploring GrapheneOS secure allocator: Hardened Malloc

The Data Commons Model Context Protocol (MCP) Server

Huntington's disease treated for first time

My game's server is blocked in Spain whenever there's a football match on

Identity Types

I Spent Three Nights Solving Listen Labs Berghain Challenge (and Got #16)

Find SF parking cops

Preparing for the .NET 10 GC

Comments