Regarding 'Me or Claude': The core concept (applying bioinformatics edit-distance/alignment to compression rather than just exact prefix matching) is something I worked on back in 2013. The implementation in this repo was heavily assisted by Claude, yes.
You're right that DEFLATE and modern algos (Zstd, Brotli) are the production standard. This project isn't trying to replace Zstd tomorrow; it's a research prototype testing the hypothesis that fuzzy matching + edit scripts can squeeze out entropy that exact-match dictionaries miss. The 8-10x slowdown means it's definitely experimental, but as a starting point for further exploration? That's what I want.
As overall note, AIs when you prompt "apply concept X in Y" (or anything really) will tell you what a great idea and then output something that without domain knowledge you've no idea if it's correct or if even makes sense at all. If don't want to do a literature research/study, recommend at least throwing the design back to the machine and asking for critique.
Here's what actually happened: the path to get here was about as far from a 'one-shot' as you can get. The first iteration (Basic LZW + unbounded edit scripts + Huffman) was roughly 100x slower. I spent hours guiding the implementation through specific optimization attempts:
- BK-trees for lookups (eventually discarded as slow).
- Then going to Arithmetic coding. First both codes + scripts, later splitting.
- Various strategies for pruning/resetting unbounded dictionaries.
- Finally landing on a fixed dict size with a Gray-Code-style nearest neighbor search to cap the exploration.
The AI suggested some tactical fixes (like capping the Levenshtein table, splitting edits/codes in Arithemtic coding), but the architectural pivots came from me. I had to find the winning path. I stopped when the speed hit 'sit-there-and-watch-it-able' (approx 15s for 2MB) and the ratio consistently beat LZW (interestingly, for smaller dics, which makes sense, as the edit scripts make each word more expressive).
That was my bar: Is it real? Does it work? Can it beat LZW? Once it did, I shared it. I was focused on the bench accuracy, not the marketing copy. I let the AI write the hype readme - I didn't really think it mattered.
In 2013, I was studying bioinformatics and had an idea to apply something like sequence alignment and edit scripts to compression instead of just, as LZW, addition at the end of the string. So, the idea for LZW-X was born long ago, but it wasn't until recently, by the power of AI, that I could implement and test it properly.
This is that proper implementation and it reveals what I intuited: that there are gains to be had using a method like this. I consider this a first rung, a starting point for further exploration.
Check it out: https://github.com/BrowserBox/LZW-X
cranberryturkey•2h ago
keepamovin•1h ago