I saw there were some attempts on Reddit, so I tried it myself.
Cross-compiled llama.cpp from macOS targeting Windows XP 64-bit. Main hurdles: downgrading cpp-httplib to v0.15.3 (newer versions explicitly block pre-Win8), replacing SRWLOCK/CONDITION_VARIABLE with XP-compatible threading primitives, and the usual DLL hell.
Qwen 2.5-0.5B runs at ~2-8 tokens/sec on period-appropriate hardware. Not fast, but it works.
Video demoand build instructions in the write-up.
Claude helped with most of the debugging on the build system. I just provided the questionable life choices.
vintagedave•2mo ago
Challenge: could you build for 32-bit? From memory, few people used XP64, it was one of the Server editions, and Vista and Windows 7, when people started migrating.
dandinu•2mo ago
regarding your question:
I have a 32bit XP version as well, and I actually started with that one.
The problem I was facing was that it's naturally limited to 4GB RAM, out of which only 3.1GB are usable (I wanted to run some beefier models and 64bit does not have the RAM limit).
Also, the 32bit OS kept freezing at random times, which was a very authentic Windows XP experience, now that I think about it. :)
vintagedave•2mo ago
That would be a real issue. I vaguely recall methods to work around this - various mappings, some Intel extension for high memory addressing, etc: https://learn.microsoft.com/en-us/windows/win32/memory/addre...
Maybe unrealistic :( I doubt this is drop-in code.
dandinu•2mo ago
The problemis that llama.cpp would need to be substantially rewritten to use it. We're talking:
You'd basically be implementing your own memory manager that swaps chunks of the model weights in and out of your addressable space. It's not impossible, but it's a pretty gnarly undertaking for what amounts to "running AI on a museum piece."