Inspect ANSI control codes and escape sequences

115•webpro•6mo ago

Comments

webpro•6mo ago

Working with and debugging ANSI control codes and escape sequences can be a challenge.

This free web-based tool helps to inspect the input, visualize colors and styling, and list control codes. By using a proper tokenizer and parser (not just regex hacks), it supports all sorts of control codes. The parser is open source and available too (find links in "about").

Type or paste text in the black text area, or try out the examples. Use the lookup table to filter & find specific codes.

Feedback welcome, I’d love to know what’s confusing, missing, or especially useful.

michaelmior•6mo ago

Very cool! Seems like this should be a Show HN post.

JdeBP•6mo ago

The revealing shibboleth is when people call it "ANSI". (-: "ANSI" is what people call it when they are working from paltry and incomplete samizdat doco of how this stuff works, from Microsoft's old ANSI.SYS appendix to its MS-DOS user manual, to innumerable modern WWW sites all repeating received wisdom.

The thing to remember is that the "E" in "ECMA" does not stand for "ANSI".

* https://ecma-international.org/publications-and-standards/st...

* https://www.itu.int/rec/T-REC-T.416-199303-I

If you read ECMA-35, you'll find that there's actually a whole system to escape sequences and control sequences. As I pointed out last month, it's often the case that people who haven't read ECMA-35 don't realize that parameter characters can be more than digits, don't handle intermediate characters, and don't grasp how DEC's question mark and SCO's equals sign fit into the overall picture. People who haven't read ECMA-48 and traced its history don't realize that there's subtlety to missing parameters in control sequences. And people who haven't read ITU/IEC T.416 do what many of us did years ago and get 24-bit colour wrong. Twice over.

* https://github.com/tattoy-org/tattoy/issues/105#issuecomment...

Other common errors include missing out on all of the other 7-bit aliases for C1 characters. Or not realising that the ECMA-35/ECMA-48 syntax allows for any control sequence to have sub-parameters, not just SGR. Or using regular expressions and pattern matching instead of a state machine. Only a state machine truly handles the fact that in the real world terminals allowed, and enacted, various C0 and C1 control characters in the middle of control sequences, as well as had ways of cancelling or restarting control sequences mid-sequence.

* https://github.com/jdebp/nosh/blob/trunk/source/ECMA48Decode...

But it gets even worse for a real world control sequence decoder.

In the real world, not only do terminals interpret the same control sequences, and their parameters, differently depending from whether the terminal is sending or receiving them; but several terminal emulators like the one in Interix, rxvt, the one built in to Linux, and even XTerm, send control sequences that not only break ECMA-35 but also conflict with received control sequences. So if one wants to be comprehensive and be cabable of decoding real data, one needs a switch to tell the program whether to decode the character stream as if it is being received by the terminal or as if it is being sent by the terminal.

* https://jdebp.uk/Softwares/nosh/guide/commands/console-decod...

Microsoft Terminal tries to do things properly, which many modern terminal emulators and tools do not, and handles this with two distinct entire state machines, one for input and one for output.

* https://github.com/microsoft/terminal/tree/main/src/terminal...

I handled it with a few goto statements and a handful of flags. (-:

* https://github.com/jdebp/nosh/blob/trunk/source/console-deco...

blueflow•6mo ago

I think this rant is out-of-place here, type "\x1b[:<=>$t" and check for yourself. It parses correctly. You do learn about the allowed character ranges for CSI sequences from ECMA-48 only, not from the Microsoft docs, so i guess the author did their homework.

JdeBP•6mo ago

That tells me that you are writing from ignorance, as for starters that's a truly pathetic test that even misses one of the characters that I explicitly mentioned above, let alone thoroughly tests the full range that the specs define. I had an actual poke around the parser code, in contrast to your superficial experimentation. (-: One can, with knowledge, actually find the point where the only three unusual characters that you in fact tested are special cased.

blueflow•6mo ago

They are not special cased:

  https://github.com/webpro/ANSI.tools/blob/main/packages/parser/src/parsers/csi.ts#L12

The comment correctly identifies the 0x30-0x3f range as parameter bytes and the following as intermediate bytes. Both the range and the names for the bytes are matching ECMA-48 Chapter 5.4.

But you seem to think that everyone except yourself is incompetent, are you trying to make up for something?

JdeBP•6mo ago

Of course they are. There's a file with all of the special cased constants in, named constants.ts.

Your superficial test tested all three of the special cases in the PRIVATE_OPENERS array, which is what the parser.ts code actually checks. DEC's question mark, which is special cased yet further off on its own, is in reality another "private opener", too, and it isn't limited to DEC (e.g. XQTMODKEYS), and neither does DEC not use the other non-digit parameter characters (e.g. DECDA3).

(There's a hypothesis that DEC's own state machine didn't care where these marker characters were, as it was a simple state machine that had to fit in ROM and probably just set a bitflag. A mistake that we're probably all still making is assuming that they only take effect when in the very first position.)

STRING_OPENERS is another widespread special casing that people do, treating ESC plus a few characters as special rather than handling all of the 7-bit aliases for the C1 characters as the general case.

You seem to think that people who share what the mistakes are and where they themselves have made these very mistakes over the years, to help other people not make them and so that the world continues to remember this hard-learned stuff, is somehow worthy of ad hominems, straw men, insults, and vilification right off the bat. That's a very poor show and you should be ashamed.

clucas•6mo ago

> people who share what the mistakes are and where they themselves have made these very mistakes over the years, to help other people not make them and so that the world continues to remember this hard-learned stuff

But then we have this in your post:

> That tells me that you are writing from ignorance, as for starters that's a truly pathetic test

and

> I had an actual poke around the parser code, in contrast to your superficial experimentation.

Perhaps you really did intend for these lines to be helpful and informative? If so, I encourage you to have a moment of empathy for your interlocutor and ask yourself if talking this way is actually the best way to communicate and pass on this hard-earned knowledge.

> ad hominems, straw men, insults, and vilification

I didn't see this from the other poster. I did see it from you. As a disinterested third party, I'm just telling you, you come off way worse in this exchange. Good luck out there buddy.

webpro•6mo ago

That's some interesting feedback, thanks for sharing. I'll see what I can extract and apply from it. Please bear with me, this is only my initial take on the whole concept (and as you point out, it isn't that trivial). Didn't have much examples to be inspired by, but we're on our way anyway.

webpro•6mo ago

Some of the issues mentioned in this thread have been improved, including private CSI sequences, default param values, and cancellation + substitution.

webpro•6mo ago

Thanks. Agreed. The way I see it, ignore the noise and there might be something in there.

ForOldHack•6mo ago

Ill never forget the comments in termcap: "Brain dead", "Very Brain dead" and "Brain? What brain!" I think most of that was terminals that CTEOL Clear to end of line was just garbage.

We just knew that at some point in time, all the Hazeltine terminals were going to end up in the garbage, which is what they deserved, and no one would rescue them.

https://www.shallowsky.com/linux/noaltscreen.html

The parent post is SOLID GOLD.

j4_james•6mo ago

> "ANSI" is what people call it when they are working from paltry and incomplete samizdat doco of how this stuff works

People just use "ANSI" as a shorthand for ANSI X3.64-1979. And that was the standard that DEC used for their VT100+ range of terminals, which in turn became the de facto standard from which most modern terminal emulators are derived. If you read the DEC documentation, you'll find many references to "ANSI standard", "ANSI controls", "ANSI colors", etc. I don't think this is because they were ignorant of the subject matter, considering that they were members of the committee that produced that standard.

And ECMA-48 is essentially just the European equivalent of ANSI X3.64, and was developed in parallel. But obviously an American company like DEC or Microsoft would more likely be working from the American version of the standard rather than the European one.

mnurzia•6mo ago

Neat tool, I could see this being handy for debugging TUI tools.

I noticed that it works with _escaped_ ESC characters ("\x1b", "\u001b", "\033") but it didn't recognize raw ESC characters that I had in my clipboard. It might be useful to support those (maybe highlight them similarly to how VS Code highlights whitespace characters). The characters show up as numbered unicode error glyphs (I'm on Firefox, if that helps)

webpro•6mo ago

Thanks, this is great feedback. I'll see what I can do, stay tuned.

webpro•6mo ago

Raw input should be cool now (there's "raw" in the examples as well)

ryan-c•6mo ago

This is really cool - I've been experimenting with terminal escape sequences recently, and they go deep. Thanks for sharing! Get in touch (email in profile) if you'd like to collaborate.

webpro•6mo ago

Thanks! It's all open source (including the tokenizer/parser), so feel free come collaborate on GitHub.

SpaceL10n•6mo ago

The things they don't prepare you for in school...

I was working at my first job and we had a ColdFusion app that was displaying some data from the database. I get a ticket one day saying our search page would crash when searching for a very specific document. The other 1 million+ documents all loaded fine to our knowledge, so why this one?

I was pretty junior back then and feeling mighty defeated as to why I couldn't figure it out. I debugged every single line and condition, trying to find some reason. After ruling out the code as a culprit, I took the data we were loading and placed it into Notepad++. Don't remember why exactly. I was wracking my brain trying to come up with explanation and lazily moving the text cursor left and right through the text, mostly out of boredom and despair.

That's when I noticed that I had pressed the right arrow key in my keyboard and the text cursor position hadn't changed! I pressed it again and nothing. Again, nothin. It took eight key presses to move the text cursor from one letter in a word to the adjacent letter. I was utterly bamboozled. Why was the text cursor getting stuck in the middle of this word?!

Shortly thereafter, I discovered "Show all hidden characters" setting in the menu. I toggled it and sure enough there were little black boxes with weird three letter strings in them. NUL, ESC, and others - right where my cursor was getting hung up.

That was the day I learned about ANSI control characters and the importance of data sanitization.

txdv•6mo ago

There are also zero-space width characters, yeah utf is a rabbit hole

webpro•6mo ago

Emojis and other unicode characters may or may not be rendered as a single-width character. I've been splitting hairs and strings.

The tool currently counts any unicode character as a single-width one.

wpm•6mo ago

Similarly, I once copied a shell script out of Slack and saw a bunch of red errors from my IDE when I pasted it in. The errors were on every line that had a new line on it. The error was "â: Command not found", despite there being no such character on the line.

Pasted it into a hex editor, tracked down the bytes, and while I can't currently remember specifically what the encoding problem was, it was something to do with going between UTF-8 > ISO-8859 > UTF-8 again.

I've since aliased `pbpaste | xxd` (macOS, linux has similar CLI tools for working with the clipboard depending on your distro/DE), because weird shit like this comes up more often than I'd care to admit. Last rabbit hole was discovering that in macOS 15, Apple changed one of the the "space" characters in the default Screenshot file names, but only if your Mac is set to use 12-hour time from a normal ASCII 0x20 space, to a Unicode 'U+202F NNBSP' "non-breaking space" between the time and AM/PM, which was causing S3 uploads to fail.

ddd34drf3•6mo ago

CudaText is better than Notepad++ in this regard. It shows ASCII control chars always. The option for "unprinted chars" only hides "arrows" over spaces/tabs.

tronster•6mo ago

This is a fantastic web util; bookmarked for the future.

I wish I had this when I was making, [Dragon's Oven](https://tronster.itch.io/dragon). It was a lot of nights and weekends of tinkering with ANSI codes in Typescript. I learned a lot that surprised me, such as: most modern OS's still don't support 16m colors out of the box and that the default Linux shell doesn't support beyond 16 colors. Also no really good modern ANSI editors out there. I tried bringing back "TheDraw" in DosBOX for some art, but ended up using a mismatch of more modern utilities, false starting one of my own, and working on an image to ASCii/ANSI converter.

Maybe it's growing up in the BBS days, but something about ANSI is really charming.

prometheus76•6mo ago

TheDraw was a cornerstone of my teenage years. I would log into different BBSs just to see their ANSI welcome screens, then I would try and re-create them to learn the art. It was a unique form of animation and I was hoping you had figured out how to get TheDraw working.

I also later used ANSI to make my own cool command line prompts in DOS and later, Linux.

ForOldHack•6mo ago

Recreate them? we would steal the stream, save it, run it through a hex editor, while watching it draw in a separate window. It got to be just a work of wonder what people came up with, and then my friend got an Amiga, and those splash screens... omg...

codesnik•6mo ago

I wonder how many languages have nice looking "\e" for "\u001b". ruby, perl, bash, anything else?

112233•6mo ago

"\u001b[0m — reset" ... what? Why SGR is not called by name, while, e.g. CUU is? strange... According to which terminal or standard it interperts sequences?

Is this tool really helpful? It does look nice! But it does not help with the corneriest cases that would benefit from such tool the most.

webpro•6mo ago

Got to start somewhere! Didn't see many examples to get inspired by either. Here's the full table: https://ansi.tools/lookup. This is my initial take on it. Please bring in the corneriest cases! It's open source so bug reports, RFCs and pull requests are most welcome.

112233•6mo ago

This thing is made out of corner cases: https://www.invisible-island.net/vttest/

I am sure capturing it's output will provide endless source of amusement and despair.

There are sequences from real terminal (e.g. stuff documented at vt100.net), sequences from ECMA 48 and friends (most of it likely never implemented), and de-facto behaviour of different software. Infamous examples being original windows terminal, rxvt (ugh), linux co nsole, emacs terminal.

Most vexing behaviour is background fill on newline, incorrect characters in terminal reports, broken scroll region, inability to write in bottom-right position etc.

This project looks fun! But it leads to endless narrow abandoned places. Hopefully you will enjoy the experience!

wonger_•6mo ago

Ghostty the terminal emulator has a cell inspector feature along these same lines

gwbas1c•6mo ago

I was a teenager when BBS's were popular. I still sometimes think I would enjoy writing an ANSI parser.

webpro•6mo ago

What would prevent you from starting? Could be fun :)

gwbas1c•6mo ago

Time: There's other nerd projects I'd like to do.

Xss3•6mo ago

Probably their free time budget

taviso•6mo ago

I've used the tool sequin in the past to debug issues: https://github.com/charmbracelet/sequin

It worked great for me, seems much easier to debug logs directly in the terminal.

webpro•6mo ago

Thanks for sharing, haven't seen that one yet. Will see if I can borrow ideas from it.

teddyh•6mo ago

No support for blinking text.

webpro•6mo ago

The parser has, but not the HTML renderer indeed. Using a third-party lib for that currently, but noticed the limitations too. Might replace it with my own!

teddyh•6mo ago

Great! Next step: torturetest.vt

webpro•6mo ago

There is https://github.com/webpro/ANSI.tools/blob/main/packages/pars... and others

teddyh•6mo ago

I was thinking of <http://artscene.textfiles.com/vt100/torturet.vt>.

webpro•6mo ago

The lexer and parser handled it perfectly first try. The HTML renderer had a few issues though, but it's being rendered reasonably well now. The torture test has been added to the package tests and to the website as an example.

ForOldHack•6mo ago

Does it crash the tester?

https://invisible-island.net/ncurses/tctest.htm

webpro•6mo ago

There is some recovery in the lexer, but would love to learn what would make it crash! The url you provide gives a 404.

teddyh•6mo ago

  s/htm/html/

I.e. <https://invisible-island.net/ncurses/tctest.html>

ForOldHack•6mo ago

Blinking text?

You do not know me, but believe me, I have a special skills that I have developed for many years to deal with people like you. And if I find you I will CREOL you.

teddyh•6mo ago

Bring it on. I’ll feed you your own form, you insignificant NUL character.

FlyingAvatar•6mo ago

I would have loved this in 1993. Not that I don't now, but I would have had a real use for it then.

webpro•6mo ago

At least I tried to make it look like a 1993 website

imran9m•6mo ago

Nice. This is helpful for making jenkis ci output colorful!!

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

I Write Games in C (yes, C)

SectorC: A C Compiler in 512 bytes

The AI boom is causing shortages everywhere else

Al Lowe on model trains, funny deaths and working with Disney

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

A Fresh Look at IBM 3270 Information Display System

Selection Rather Than Prediction

History and Timeline of the Proco Rat Pedal (2021)

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Making geo joins faster with H3 indexes

Hackers (1995) Animated Experience

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

I Write Games in C (yes, C)

SectorC: A C Compiler in 512 bytes

The AI boom is causing shortages everywhere else

Al Lowe on model trains, funny deaths and working with Disney

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

A Fresh Look at IBM 3270 Information Display System

Selection Rather Than Prediction

History and Timeline of the Proco Rat Pedal (2021)

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Making geo joins faster with H3 indexes

Hackers (1995) Animated Experience

Inspect ANSI control codes and escape sequences

Comments