The thing to remember is that the "E" in "ECMA" does not stand for "ANSI".
* https://ecma-international.org/publications-and-standards/st...
* https://ecma-international.org/publications-and-standards/st...
* https://www.itu.int/rec/T-REC-T.416-199303-I
If you read ECMA-35, you'll find that there's actually a whole system to escape sequences and control sequences. As I pointed out last month, it's often the case that people who haven't read ECMA-35 don't realize that parameter characters can be more than digits, don't handle intermediate characters, and don't grasp how DEC's question mark and SCO's equals sign fit into the overall picture. People who haven't read ECMA-48 and traced its history don't realize that there's subtlety to missing parameters in control sequences. And people who haven't read ITU/IEC T.416 do what many of us did years ago and get 24-bit colour wrong. Twice over.
* https://github.com/tattoy-org/tattoy/issues/105#issuecomment...
Other common errors include missing out on all of the other 7-bit aliases for C1 characters. Or not realising that the ECMA-35/ECMA-48 syntax allows for any control sequence to have sub-parameters, not just SGR. Or using regular expressions and pattern matching instead of a state machine. Only a state machine truly handles the fact that in the real world terminals allowed, and enacted, various C0 and C1 control characters in the middle of control sequences, as well as had ways of cancelling or restarting control sequences mid-sequence.
* https://github.com/jdebp/nosh/blob/trunk/source/ECMA48Decode...
But it gets even worse for a real world control sequence decoder.
In the real world, not only do terminals interpret the same control sequences, and their parameters, differently depending from whether the terminal is sending or receiving them; but several terminal emulators like the one in Interix, rxvt, the one built in to Linux, and even XTerm, send control sequences that not only break ECMA-35 but also conflict with received control sequences. So if one wants to be comprehensive and be cabable of decoding real data, one needs a switch to tell the program whether to decode the character stream as if it is being received by the terminal or as if it is being sent by the terminal.
* https://jdebp.uk/Softwares/nosh/guide/commands/console-decod...
Microsoft Terminal tries to do things properly, which many modern terminal emulators and tools do not, and handles this with two distinct entire state machines, one for input and one for output.
* https://github.com/microsoft/terminal/tree/main/src/terminal...
I handled it with a few goto statements and a handful of flags. (-:
* https://github.com/jdebp/nosh/blob/trunk/source/console-deco...
https://github.com/webpro/ANSI.tools/blob/main/packages/parser/src/parsers/csi.ts#L12
The comment correctly identifies the 0x30-0x3f range as parameter bytes and the following as intermediate bytes. Both the range and the names for the bytes are matching ECMA-48 Chapter 5.4.But you seem to think that everyone except yourself is incompetent, are you trying to make up for something?
Your superficial test tested all three of the special cases in the PRIVATE_OPENERS array, which is what the parser.ts code actually checks. DEC's question mark, which is special cased yet further off on its own, is in reality another "private opener", too, and it isn't limited to DEC (e.g. XQTMODKEYS), and neither does DEC not use the other non-digit parameter characters (e.g. DECDA3).
(There's a hypothesis that DEC's own state machine didn't care where these marker characters were, as it was a simple state machine that had to fit in ROM and probably just set a bitflag. A mistake that we're probably all still making is assuming that they only take effect when in the very first position.)
STRING_OPENERS is another widespread special casing that people do, treating ESC plus a few characters as special rather than handling all of the 7-bit aliases for the C1 characters as the general case.
You seem to think that people who share what the mistakes are and where they themselves have made these very mistakes over the years, to help other people not make them and so that the world continues to remember this hard-learned stuff, is somehow worthy of ad hominems, straw men, insults, and vilification right off the bat. That's a very poor show and you should be ashamed.
But then we have this in your post:
> That tells me that you are writing from ignorance, as for starters that's a truly pathetic test
and
> I had an actual poke around the parser code, in contrast to your superficial experimentation.
Perhaps you really did intend for these lines to be helpful and informative? If so, I encourage you to have a moment of empathy for your interlocutor and ask yourself if talking this way is actually the best way to communicate and pass on this hard-earned knowledge.
> ad hominems, straw men, insults, and vilification
I didn't see this from the other poster. I did see it from you. As a disinterested third party, I'm just telling you, you come off way worse in this exchange. Good luck out there buddy.
I noticed that it works with _escaped_ ESC characters ("\x1b", "\u001b", "\033") but it didn't recognize raw ESC characters that I had in my clipboard. It might be useful to support those (maybe highlight them similarly to how VS Code highlights whitespace characters). The characters show up as numbered unicode error glyphs (I'm on Firefox, if that helps)
I was working at my first job and we had a ColdFusion app that was displaying some data from the database. I get a ticket one day saying our search page would crash when searching for a very specific document. The other 1 million+ documents all loaded fine to our knowledge, so why this one?
I was pretty junior back then and feeling mighty defeated as to why I couldn't figure it out. I debugged every single line and condition, trying to find some reason. After ruling out the code as a culprit, I took the data we were loading and placed it into Notepad++. Don't remember why exactly. I was wracking my brain trying to come up with explanation and lazily moving the text cursor left and right through the text, mostly out of boredom and despair.
That's when I noticed that I had pressed the right arrow key in my keyboard and the text cursor position hadn't changed! I pressed it again and nothing. Again, nothin. It took eight key presses to move the text cursor from one letter in a word to the adjacent letter. I was utterly bamboozled. Why was the text cursor getting stuck in the middle of this word?!
Shortly thereafter, I discovered "Show all hidden characters" setting in the menu. I toggled it and sure enough there were little black boxes with weird three letter strings in them. NUL, ESC, and others - right where my cursor was getting hung up.
That was the day I learned about ANSI control characters and the importance of data sanitization.
I wish I had this when I was making, [Dragon's Oven](https://tronster.itch.io/dragon). It was a lot of nights and weekends of tinkering with ANSI codes in Typescript. I learned a lot that surprised me, such as: most modern OS's still don't support 16m colors out of the box and that the default Linux shell doesn't support beyond 16 colors. Also no really good modern ANSI editors out there. I tried bringing back "TheDraw" in DosBOX for some art, but ended up using a mismatch of more modern utilities, false starting one of my own, and working on an image to ASCii/ANSI converter.
Maybe it's growing up in the BBS days, but something about ANSI is really charming.
I also later used ANSI to make my own cool command line prompts in DOS and later, Linux.
Is this tool really helpful? It does look nice! But it does not help with the corneriest cases that would benefit from such tool the most.
It worked great for me, seems much easier to debug logs directly in the terminal.
webpro•3d ago
This free web-based tool helps to inspect the input, visualize colors and styling, and list control codes. By using a proper tokenizer and parser (not just regex hacks), it supports all sorts of control codes. The parser is open source and available too (find links in "about").
Type or paste text in the black text area, or try out the examples. Use the lookup table to filter & find specific codes.
Feedback welcome, I’d love to know what’s confusing, missing, or especially useful.
michaelmior•4h ago