Usually there are only a couple places that actually deal with user controlled data, so switching to safe dependencies for things like making thumbnails for pdf files can be effective.
Edit: One more thing is compiling unsafe code to web assembly or other forms of sandboxing it was not mentioned.
- it conflates data race protection with memory safety, and it does so inconsistently. Java and C# are mentioned as MSLs and yet they totally let you race. More fundamentally, data races aren’t the thing that attackers exploit except when those data races do lead to actual memory corruption (like use after free, double free, out of bounds, access to allocator metadata etc). So it’s more precise to not mention data races freedom as a requirement for memory safety, both because otherwise languages like Java and C# don’t meet the definition despite being included in the list and because data races in the presence of memory safety are not a big deal from a security standpoint.
- The document fails to mention to mention Fil-C. It would be understandable if it was mentioned with caveats (“new project”, “performance blah blah”) but not mentioning it at all is silly.
This is meant to be a practical strategy that can be implemented nation-wide, without turning into another https://xkcd.com/2347
Seems like a bad way to pick technology.
They do mention things like TRACTOR. Fil-C is far ahead of any project under the TRACTOR umbrella.
> This is meant to be a practical strategy that can be implemented nation-wide, without turning into another https://xkcd.com/2347
The solution to that is funding the thing that is essential, rather than complaining that an essential thing is unfunded. DOD could do that
Hmm, I take it that the situation is that there are a number of vendors/providers/distros/repos who could be distributing your memory-safe builds, but are currently still distributing unsafe builds?
I wonder if an organization like the Tor project [1] would be more motivated to "officially" distribute a Fil-C build, being that security is the whole point of their product. (I'm talking just their "onion router" [2], not (necessarily) the whole browser.)
I could imagine that once some organizations start officially shipping Fil-C builds, adoption might accelerate.
Also, have you talked to the Ladybird browser people? They seemed to be taking an interested in Fil-C.
This is a very sensible way to pick a technology for a government.
Having a cool proof of concept, with a bus factor of 1, and having a solution that countless government agencies can depend on for multi-million-dollar decades-long software projects are very different things.
They can't just depend out of the blue on you personally maintaining "Fil's Unbelievable Garbage Collector" for the lifetime of the government's projects. Maybe you believe they could, but it takes way more legwork to give such assurance to a government.
They list TRACTOR under projects they've already funded (and crucially, not among solutions they recommend yet). Apply for funding for Fil-C, and if it gets accepted, it'll probably get listed there too.
The TRACTOR approach also has higher tolerance to being an experimental project, because it's one-time conversion of C to Rust. It only needs to work once, not continuously for decades. The Rust-lang org is set up to offer serious long-term support, and is way past having a critical dependency on a single developer.
No, it's not, for the simple reason that the government has more than adequate resources to recreate a Fil-C-like with a team, or even just add people power to Fil-C.
The fact that it only took one dude working in his spare time 1.5 years to make C memory safe suggests that the whole narrative of the OP is wrong. The problem isn't that people aren't using memory safe languages. The problem is that nobody is funding just making C memory safe.
That's not what people use C for. You're presenting it as a memory-safe C, but you've got a more fine-grained ASAN. That's useful, but it's not blowing away the whole narrative.
For running unfixable legacy C code there are already lower-overhead solutions. They're not as precise, but that's either not necessary for safety (e.g. where there's a right sandbox boundary), or the performance is so critical that people accept incomplete hardening despite the risks.
For new development, where a slower GC language is suitable, there are plenty of languages to choose from that are more convenient and less crash-prone.
There's already CHERI that takes a similar approach to pointer tagging, but they're doing it in hardware, because they know that software emulation makes the solution unappealing.
Says who?
Most software written in C is not perf sensitive. My shell could be 4x slower and I wouldn’t care.
That’s also true for most of the GUI stuff I use, including the browser.
> you've got a more fine-grained ASAN.
The difference between Fil-C and asan is that Fil-C is memory safe while asan isn’t.
This has nothing to do with “fine grained”.
> it's not blowing away the whole narrative.
The narrative is that C is not a memory safe language. That narrative is false.
If the narrative was, “C is only memory safe if you’re willing to pay perf cost” then like whatever. But that’s not what folks are saying
> For running unfixable legacy C code there are already lower-overhead solutions. They're not as precise, but that's either not necessary for safety (e.g. where there's a right sandbox boundary), or the performance is so critical that people accept incomplete hardening despite the risks.
No there aren’t. Fil-C is the only memory safe solution for C code.
Hwasan, mte, etc aren’t memory safe. Asan isn’t memory safe (and probably also isn’t cheaper). Don’t know what else you’re thinking of.
> There's already CHERI that takes a similar approach to pointer tagging
Neither Cheri nor Fil-C use pointer tagging. Both use pointer capabilities. Fil-C’s capabilities are safer (they actually protect use after free).
Fil-C is faster than Cheri because I can run Fil-C on fast commodity hardware. Fil-C in my x86 box is orders of magnitude faster than the fastest Cheri machine ever
It is correct that data races in a garbage collected language are difficult to turn into exploits.
The problem is that data races in C and C++ do in fact get combined with other memory safety bugs into exploits.
A definition from first principles is still missing, but imagine it takes the form of "all memory access is free from UB". Then whether the pointer is in-bounds, or whether no thread is concurrently mutating the location seem to be quite similar constraints.
Rust does give ways to control concurrency, eg via expressing exclusive access through &mut reference. So there is also precedent that the same mechanisms can be used to ensure validity of reference (not dangling) as well as absence of concurrent access.
Because C and C++ are not memory safe.
> A definition from first principles is still missing, but imagine it takes the form of "all memory access is free from UB". Then whether the pointer is in-bounds, or whether no thread is concurrently mutating the location seem to be quite similar constraints.
I think it's useful to work backwards from the languages that security folks say are "memory safe", since what they're really saying is, "I cannot use the attacks I'm familiar with against programs written in these languages".
Based on that, saying "no UB" isn't enough, and only looking at memory accesses isn't enough.
WebAssembly has no UB, but pointers defined to just be integers (i.e. the UB-free structured assembly semantics of a C programmer's dreams). So, attackers can do OOB and UAF data attacks within the wasm memory. The only thing attackers cannot do is control the instruction pointer or escape the wasm memory (unless the wasm embedder has a weak sandbox policy, in which case they can do both). Overall, I think that memory-safety-in-the-sense-of-wasm isn't really memory safety at all. It's too exploitable.
To be memory safe like the "MSLs" that security folks speak of, you also need to consider stuff like function calls. Depending on the language, you might have to look at other stuff, too.
I think that what security folks consider "memory safe" is the combination of these things:
1) Absence of UB. Every outcome is well defined.
2) Pointers (or whatever pointer-like construct your language has) can only be used to access whatever allocation the originated from (i.e. pointers carry capabilities).
And it's important that these get "strongly" combined; i.e. there is no operation in the language that could be used to break a pointer's capability enforcement.
Java and Fil-C both have a strong combination of (1) and (2).
But, long story short, it's true that a definition of memory safety from first principles is missing in the sense that the field hasn't settled on a consensus for what the definition should be. It's controversial because you could argue that under my definition, Rust isn't memory safe (you can get to UB in Rust). And, you could argue that wasm meets (2) because "the allocation" is just "all of memory". I'm not even sure I like my own definition. At some point you have to say more words about what an allocation is.
This is absolutely not true. One of the classic data races is when you do a set of operations like this non-atomically:
new_total = account.balance;
new_total -= purchase_price;
account.balance = new_total;
Which is a huge security vulnerability because it lets people double spend. Alice buys something for $1000 and something for $1 and instead of debiting her account by $1001 it debits it by $1 because the write for the second transaction clobbers the balance reduction from the first one.Another common one is symbolic links. You check the target of a symbolic link and then access it, but between the check and the access the link changed and now you're leaking secrets or overwriting privileged data.
Data races are serious vulnerabilities completely independent of memory safety.
Data races aren't like any real world experience. The way the machine actually works is too alien for us to get our heads around so we're provided with a grossly simplified "sequentially consistent" illusion when writing high level languages like C - in which things happen in some order. Data races are reality "bleeding through" if we don't follow the rules to preserve that illusion.
Time of check to time of use
https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
I didn't know this, thank you
Logic errors not preventable by any language or type system (like making sure you enforce policy in a setuid process) are far more likely than that.
Races in a database are not “data races” in the programming language sense, unless we’re debating. The hat query language to use
That's kind of the point.
> Races in a database are not “data races” in the programming language sense
Only in the sense that in a sufficiently large bureaucracy you get to power up the Somebody Else's Problem Field and blame the DBA for it. But that doesn't get you the missing money back.
And lots of programming languages provide tools to address data races. SQL has transactions, several languages have compare-and-swap primitives, etc.
In Java a data race means loss of sequential consistency. Humans generally don't understand programs which lack sequential consistency so a typical Java team probably can't debug the program, but the program still always has well defined behaviour - and chances are you don't want to debug the weird non-sequentially consistent behaviour anyway, you just want them to fix the data race.
In C# data races are not too dangerous for trivial objects which are valid for all bit patterns. If you race an integer k, well, now k is smashed, don't think too hard about the value of k, it does have some value but it won't go well for you to try to reason about the value. For a complex object like a hash table, it's Undefined Behaviour.
Meanwhile in C or C++ all data races are immediate UB, you lose, game over.
The reason why YOLO-C has the bad data race behavior is because (1) data races might lead to memory safety violations and (2) the compiler is permitted to play more fast and loose than strictly necessary. Fil-C fixes (1) by making the language memory safe. Fil-C fixes (2) by just having different policies in the compiler.
Note though that data races can make otherwise memory-safe programs not actually memory safe. See for example Go
For what it's worth, from what I've read on here (e.g., [0]) this has yet to be exploited in a non-demonstration setting, but who knows if/when the first such exploit will appear.
Or, it means that "Go is only memory safe provided you have no data races".
My point is that it's weird to say that there is a notion of memory safety that is separate from the memory safety you get if you also have a story for data races. It leads to exactly the confusion in the OP: it's not clear if they're saying that memory safety subsumes data race freedom in the sense that you're memory-safe even in the presence of races (like Java or Fil-C), or that it means that memory safety subsumes data race freedom in the sense that Rust's type system handles both memory safety and data races using the same basic technique.
They offer default protections that can be easily overridden in most of those languages. Some of them require you to use those overrides to implement common data structures.
> MSLs can prevent entire classes of vulnerabilities, such as buffer overflows, dangling pointers, and numerous other Common Weakness Enumeration (CWE) vulnerabilities.
If used a certain way.
> Android team made a strategic decision to prioritize MSLs, specifically Rust and Java, for all new development
Was that /all/ they did?
> Invest initially in training, tools, and refactoring. This investment can usually be offset by long-term savings through reduced downtime, fewer vulnerabilities, and enhanced developer efficiency.
That is an exceedingly dubious claim to make in general.
About where that number was twenty years previous.
The big difference is that twenty years ago, the enemy was script kiddies. Now it's competent teams funded by multiple nation-states.
I have no clue who is carrying the torch now. Someone probably, given that OP's document references Rust 16 times and Java just 4. But we'll have to see how CISA shakes out after its funding gets cut due to alleged mission creep
awaymazdacx5•7mo ago