I'm glad they tracked it down even further to figure out exactly why.
Win9x video games that made bad assumptions about the stack were a theme I saw. One of the differences between win9x and NT based windows is that kernel32 (later kernelbase) is a now user mode wrapper atop ntdll, whereas in the olden days kernel32 would trap directly into the kernel. This means that kernel32 uses more user mode stack space in NT. A badly behaving app that stored data to the left of the stack pointer and called into kernel32 might see its data structures clobbered in NT and not in 9x. So there were compatibility hacks that temporarily moved the stack pointer for certain apps.
https://web.archive.org/web/20250423144746/https://cookieplm...
This sentence is the real takeaway point of the article. Undefined behavior is extremely insidious and can lull you into the belief that you were right, when you already made a mistake 1000 steps ago but it only got triggered now.
I emphasized this point in my article from years ago (but after the game was released):
> When a C or C++ program triggers undefined behavior, anything is allowed to happen in the program execution. And by anything, I really mean anything: The program can crash with an error message, it can silently corrupt data, it can morph into a colorful video game, or it can even give the right result.
> If you’re lucky, the program triggering UB will show an appropriate error message and/or crash, making you immediately aware that something went wrong. If you’re unlucky, the program will quietly mangle data, and by the time you notice the problem (via effects such as crashes or incorrect output) the root cause has been buried in the past execution history. And if you’re very unlucky, the program will do exactly what you hoped it should do, until you change some unrelated code / compiler versions / compiler vendors / operating systems / hardware platforms – and then a new bug becomes visible, and you have no clue why seemingly correct code now fails to work properly.
-- https://www.nayuki.io/page/undefined-behavior-in-c-and-cplus...
As I wrote in my article, this point really got hammered into me when a coworker showed me a patch that he made - which added a couple of innocuous, totally correct print statements to an existing C++ program - and that triggered a crash. But without his print statements, there was no crash. It turned out that there was a preexisting out-of-bounds array write, and the layout of the stack/heap somehow masked that problem before, and his unlucky prints unmasked the problem.
Okay so then, how can we do better as developers today?
0) Read, understand, and memorize what actions in C or C++ are undefined behavior. Avoid them in your code at all costs. Also obey the preconditions of any API you use, whether in the standard library, operating system, etc.
1) Compile your application in Debug mode and compare its behavior to Release mode. If they differ by anything other than speed, then you have a serious problem on your hands.
2) Compile and run with sanitizers like -fsanitize=undefined,address to catch undefined behavior at runtime.
3) Use managed languages like Java, C#, Python, etc. where you basically don't have to worry about UB in normal day-to-day code. Or use very well-designed low-level languages like Rust that are safe by default and minimize your exposure to UB when you really need to do advanced things. Whereas C and C++ have been a bonanza of UB like we have never seen before in any other language.
What compiler error would you expect here? Maybe not checking the return value from scanf to make sure it matches the number of parameters? Otherwise this seems like a data file error that the compiler would have no clue about.
(so, for example, this bug would have never been created by Rust unless it was deeply misused)
That's one hell of a language!
It's true that C may be unique-ish in this regard though- this bug also couldn't happen in Ruby, which is not a functional language, but Ruby certainly still makes undefined behaviors much more possible than in other languages like Elixir.
A little piece of technology made sense in the original context, but then it got moved to a different context without realizing that move broke the contract. Specifically in this case a flying boat became an airplane.
---
I recently worked a bug that feels very similar:
A linux cups printer would not print to the selected tray, instead it always requested manual feed.
Ok. Try a bunch of command line options, same issue.
Ok. Make the selection directly in the PPD (postscript printer definition) file. Same issue.
Ok! Decompile the PXL file. Wrong tray is set in pxl file... why?
Check Debug2 log level for cups - Wrong MediaPosition is being sent to ghostscript (which compiles the printer options into the print job) by a cups filter... why?
Cups filter is translating the MediaPosition from the PPD file... because the philosophy of cups is to do what the user intended. The intention inferred from MediaPosition in the PPD file (postscript printer definition) is that the MediaPosition corresponds to the PWG (Printer Working Group) MediaPosition, NOT the vendor MediaPosition (or local equivalent - in this case MediaSource).
AHA!! My PPD file had been copied from a previous generation of server, from a time when that cups filter did NOT translate the MediaPosition, so the VENDOR MediaSource numbers were used. Historically, this makes sense. The vendor tray number is set in the vendor ppd file because cups didn't know how to translate that.
Fast forward to a new execution context, and cups filters have gotten better at translating user intention, now it's translating a number that doesn't need to be translated, and silently selecting the wrong tray.
TLDR; There is no such thing as a printer command, only printer suggestions.
> This is an interesting lesson in compatibility: even changes to the stack layout of the internal implementations can have compatibility implications if an application is bugged and unintentionally relies on a specific behavior.
I suppose this is why Linux kernel maintainers insist on never breaking user space.
https://static.wikia.nocookie.net/gta-myths/images/f/f1/Egg_...
Really this is more a story about poor development practice than it is an interesting bug.
mschuster91•4h ago
On top of that, the hardware requirements (256MB of system RAM, and the PlayStation 2 only had 32MB) made it enough of a challenge to get the game running at all. Throwing in a heavyweight parsing library for either of these three languages was out of the question.
ceejayoz•4h ago
Magma7404•4h ago
Most of the time, the programmers who do this do not follow the simple rule that Stroustrup said which is to define or initialize a variable where you declare it (i.e. declare it before using it), and which would solve a lot of bugs in C++.
mschuster91•4h ago
Yeah but we're talking about a 2004 game that was pretty rushed after 2002's Vice City (and I wouldn't be surprised if the bug in the ingestion code didn't exist there as well, just wasn't triggered due to the lack of planes except that darn RC Chopper and RC plane from that bombing run mission). Back then, the tooling to spot UB and code smell didn't even exist or, if at all, it was very rudimentary, or the warnings that did come up were just ignored because everything seemed to work.
Yeask•4h ago
ryandrake•4h ago
butlike•3h ago
RHSeeger•3h ago
soulofmischief•3h ago
RHSeeger•3h ago
usefulcat•1h ago
I worked in gamedev around the time this game was made and this would have been very much an ordinary, everyday kind of bug. The only really exceptional thing about it is that it was discovered after such a long time.
trinix912•3h ago
badc0ffee•3h ago
trinix912•3h ago
mrighele•1h ago
You cannot initialize them with a different value unless you also write a constructor, but it not the issue here (since you are supposed to read them from the file system)
kevin_thibedeau•4h ago
butlike•3h ago
To be honest, I just don't like how you disparaged the programmer out-of-context. Talk is cheap.
db48x•3h ago
trinix912•3h ago
db48x•3h ago
But it is even more important for today’s game studios to see and understand the mistakes that yesterday’s studios made. That’s the only way to avoid making them all over again.
khedoros1•4m ago
And in 2004, didn't have a published specification, or much use outside of webdev (which hadn't eaten the world yet).
> and SAX parsers are 22
And, especially at the time, pretty much exclusive to Java, right?
Put another way, which are the high-quality open-source implementations of those formats that the developers should've considered while working on SA in 2003 and 2004? Or for that matter, in the 2001-2002 timeframe, when the parsing code was probably actually written for use in VC?
danbolt•2h ago
Your average hire for the time might have been self-taught with the occasional C89 tutorial book and two years of Digipen. Today’s graduates going into games have fallen asleep to YouTube lectures of Scott Meyers and memorized all the literature on a fixed timestep.
fragmede•1h ago
bluedino•4h ago
epcoa•4h ago
coldpie•3h ago
mrguyorama•3h ago
Besides, the complaint about not having a heavyweight parser here is weird. This is supposed to be "trusted data", you shouldn't have to treat the file as a threat, so a single line sscanf that's just dumping parsed csv attributes into memory is pretty great IMO.
Definitely initialize variables when it comes to C though.
CamouflagedKiwi•2h ago
This isn't that uncommon - look at something like Diablo 2 which has a huge amount of game data defined from text files (I think these are encoded to binary when shipped but it was clearly useful to give the game a mode where it'd load them all from text on startup).
db48x•4h ago
rfoo•3h ago
[0] https://docs.gtk.org/glib/
wat10000•38m ago
anthk•4h ago
PhilipRoman•1h ago
debugnik•32m ago
https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...