2011 is around the time that programmers start taking undefined behavior seriously as an actual bug in their code and not in the compiler, especially as we start to see the birth of tools to better diagnose undefined behavior issues the compilers didn't (yet) take advantage of. There's also a set of major, language-breaking changes to the C and C++ standards that took effect around the time (e.g., C99 introduced inline with different semantics from gcc's extension, which broke a lot of software until gcc finally switched the default from C89 to C11 around 2014). And newer language versions tend to make obsolete hacky workarounds that end up being more brittle because they're taking advantage of unintentional complexity (e.g., constexpr-if removes the need for a decent chunk of template metaprogramming that relied on SFINAE, a concept which is difficult to explain even to knowledgeable C++ programmers). So in general, newer code is likelier to be substantially more compatible with future compilers and future language changes.
But on the other hand, we've also seen a greater tend towards libraries with less-well-defined and less stable APIs, which means future software is probably going to have a rougher time with getting all the libraries to play nice with each other if you're trying to work with old versions. Even worse, modern software tends to be a lot more aggressive about dropping compatibility with obsolete systems. Things like (as mentioned in the blog post) accessing the modern web with decade-old software is going to be incredibly difficult, for example.
Is this due to changing default values for the standard used, and would be "fixed" by adding "std=xxx" to the CXXFLAGS?
I've successfully built ~2011 era LLVM with no issues with the compiler itself (after that option change) using gcc last year - there were a couple of bugs in the llvm code though that I had to workaround (mainly relying on transitive includes from the standard library, or incorrect LLVM code that is detected by the newer compilers)
One of the big pain points I have with c++ is the dogmatic support of "old" code, I'd argue to the current version's detriment. But because of that I've never had an issue with code version backwards compatibility.
That said, failures in building old software are very often due to one of:
* transitive headers (as you mentioned)
* typedef changes (`siginfo_t` vs `struct siginfo` comes to mind)
* macros with bad names (I was involved in the zlib `ON` drama)
* changes in library arrangement (the ncurses/tinfo split comes to mind, libcurl3/4 conditional ABI change, abuse of `dlopen`)
Most of these are one-line fixes if you're willing to patch the old code, which significantly increases the range of versions supported and thus reduces the number of artifacts you need to build for bootstrapping all the way to a modern version.
Some of it is deliberately undefined in the standard so that compilers can use it, e.g. it's UB to use a reserved identifier so that compilers & future standard versions can add new keywords. This is why C's boolean type first got named `_Bool` and C++ defines `__cplusplus`: identifiers starting with an underscore and a capital letter or with two underscores are reserved, and using reserved identifiers is Undefined Behavior.
Some of it is that the compiler authors know how their compiler will generate code, and can rely on changing internal uses of UB when they change the code generation.
Could you link to something about it? It's the first time I hear about it.
That sounds a bit worrying from a "reflections on trusting trust" perspective. Who's to say that those non-public commits didn't introduce a compiler backdoor? But I guess the more likely explanation is that somebody did some last-minute hotfixes that were later reworked before inclusion in the permanent record.
Debian FTW.
Could this simple non-checking Rust implementation transliterate the real Rust compiler's code, to unchecked C, that is good enough for that minimal-steps, sustainable bootstrapping?
This simple non-checking compiler only has to be able to compile one program, and only under controlled conditions, possibly only on hardware with a ton of memory.
Some time from now x86_64 will fade away, and there's a large chance rust will still be around. I know that this will probably take a long time, but it's better and easier to do it now than later.
Compiling the rust code for the compiler how? The whole point is that we don't have rustc.
This is a made up restriction.
Ken Thompson, "Reflections on Trusting Trust". https://dl.acm.org/doi/10.1145/358198.358210
This is actually tenable for C, though - so maybe you could cook up some sort of C -> C++ -> LLVM -> rustc bootstrap.
fcoury•7mo ago
superkuh•7mo ago
neilv•7mo ago
Regardless of whether Cloudflare is the particular infra company, the company who uses them responds to blocked people: "We don't know why some users can't access our Web site, and we don't even know the percentage of users who get blocked, but we're just cargo-culting our jobs here, so sux2bu."
The outsourced infra company's response is: "We're running a business here, and our current solution works well enough for that purpose, so sux2bu."
o11c•7mo ago
So I propose "cloudfart" - just rude enough it can't be casually dismissed, but still tolerable in polite company. "I can't access your website (through the cloudfart |, it's just cloudfarting at me)."
Other names (not all applicable for this exact use): cloudfable, cloudunfair, cloudfalse, cloudfarce, cloudfault, cloudfear, cloudfeeble, cloudfeudalism, cloudflake, cloudfluke, cloudfreeze, cloudfuneral.
neilv•7mo ago
Not just sound like we're taking in stride an unavoidable fact of nature.
Want people to stop saying "ClouldFlareup" (like a social disease)? Stop causing it.
tmtvl•7mo ago
gregorvand•7mo ago
CaptainFever•7mo ago
In case others can't access the archive link:
Elsewhere I've been asked about the task of replaying the bootstrap process for rust. I figured it would be fairly straightforward, if slow. But as we got into it, there were just enough tricky / non-obvious bits in the process that it's worth making some notes here for posterity.
context
Rust started its life as a compiler written in ocaml, called rustboot. This compiler did not use LLVM, it just emitted 32-bit i386 machine code in 3 object file formats (Linux PE, macOS Mach-O, and Windows PE).
We then wrote a second compiler in Rust called rustc that did use LLVM as its backend (and which, yes, is the genesis of today's rustc) and ran rustboot on rustc to produce a so-called "stage0 rustc". Then stage0 rustc was fed the sources of rustc again, producing a stage1 rustc. Successfully executing this stage0 -> stage1 step (rather than just crashing mid-compilation) is what we're going to call "bootstrapping". There's also a third step: running stage1 rustc on rustc's sources again to get a stage2 rustc and checking that it is bit-identical to the stage1 rustc. Successfully doing that we're going to call "fixpoint".
Shortly after we reached the fixpoint we discarded rustboot. We stored stage1 rustc binaries as snapshots on a shared download server and all subsequent rust builds were based on downloading and running that. Any time there was an incompatible language change made, we'd add support and re-snapshot the resulting stage1, gradually growing a long list of snapshots marking the progress of rust over time.
time travel and bit rot
Each snapshot can typically only compile rust code in the rust repository written between its birth and the next snapshot. This makes replay of replaying the entire history awkward. We're not going to do that here. This post is just about replaying the initial bootstrap and fixpoint, which happened back in April 2011, 14 years ago.
Unfortunately all the tools involved -- from the host OS and system libraries involved to compilers and compiler-components -- were and are moving targets. Everything bitrots. Some examples discovered along the way:
debianWe're in a certain amount of luck though:
rustThe next problem is figuring out the code to build. Not totally trivial but not too hard. The best resource for tracking this period of time in rust's history is actually the rust-dev mailing list archive. There's a copy online at mail-archive.com (and Brian keeps a public backup of the mbox file in case that goes away). Here's the announcement that we hit a fixpoint in April 2011. You kinda have to just know that's what to look for. So that's the rust commit to use: 6daf440037cb10baab332fde2b471712a3a42c76. This commit still exists in the rust-lang/rust repo, no problem getting it (besides having to copy it into the container since the container can't contact github, haha).
LLVM
Unfortunately we only started pinning LLVM to specific versions, using submodules, after bootstrap, closer to the initial "0.1 release". So we have to guess at the LLVM version to use. To add some difficulty: LLVM at the time was developed on subversion, and we were developing rust against a fork of a git mirror of their SVN. Fishing around in that repo at least finds a version that builds -- 45e1a53efd40a594fa8bb59aee75bb0984770d29, which is "the commit that exposed LLVMAddEarlyCSEPass", a symbol used in the rustc LLVM interface. I bootstrapped with that (brson/llvm) commit but subversion also numbers all commits, and they were preserved in the conversion to the modern LLVM repo, so you can see the same svn id 129087 as e4e4e3758097d7967fa6edf4ff878ba430f84f6e over in the official LLVM git repo, in case brson/llvm goes away in the future.
Configuring LLVM for this build is also a little bit subtle. The best bet is to actually read the rust 0.1 configure script -- when it was managing the LLVM build itself -- and work out what it would have done. But I have done that and can now save you the effort: ./configure --enable-targets=x86 --build=i686-unknown-linux-gnu --host=i686-unknown-linux-gnu --target=i686-unknown-linux-gnu --disable-docs --disable-jit --enable-bindings=none --disable-threads --disable-pthreads --enable-optimized
So: configure and build that, stick the resulting bin dir in your path, and configure and make rust, and you're good to go!
root@65b73ba6edcc:/src/rust# sha1sum stage*/rustc 639f3ab8351d839ede644b090dae90ec2245dfff stage0/rustc 81e8f14fcf155e1946f4b7bf88cefc20dba32bb9 stage1/rustc 81e8f14fcf155e1946f4b7bf88cefc20dba32bb9 stage2/rustc
Observations
On my machine I get: 1m50s to build stage0, 3m40s to build stage1, 2m2s to build stage2. Also stage0/rustc is a 4.4mb binary whereas stage1/rustc and stage2/rustc are (identical) 13mb binaries.
While this is somewhat congruent with my recollections -- rustboot produced code faster, but its code ran slower -- the effect size is actually much less than I remember. I'd convinced myself retroactively that rustboot was produced abysmally worse code than rustc-with-LLVM. But out-of-the-gate LLVM only boosted performance by 2x (and cost of 3x the code size)! Of course I also have a faster machine now. At the time bootstrap cycles took about a half hour each (according to this: 15 minutes for the 2nd stage).
Of course you can still see this as a condemnation of the entire "super slow dynamic polymorphism" model of rust-at-the-time, either way. It may seem funny that this version of rustc bootstraps faster than today's rustc, but this "can barely bootstrap" version was a mere 25kloc. Today's rustc is 600kloc. It's really comparing apples to oranges.