$ echo -n Vm0 | base64
Vm0w
It can be extended indefinitely one character at a time, but there will always be some suffix.(Because the output is necessarily 8/6 the size of the input, the suffix always adds 33% to the length.)
#!/usr/bin/env python3
import base64
def len_common_prefix(a, b):
assert len(a) < len(b)
for i in range(len(a)):
if a[i] != b[i]:
return i
return len(a)
def calculate_quasi_fixed_point(start, length):
while True:
tmp = base64.b64encode(start)
l = len_common_prefix(start, tmp)
if l >= length:
return tmp[:length]
print(tmp[:l].decode('ascii'), tmp[l:].decode('ascii'), sep='\v')
# Slicing beyond end of buffer will safely truncate in Python.
start = tmp[:l*4//3+4] # TODO is this ideal?
if __name__ == '__main__':
final = calculate_quasi_fixed_point(b'\0', 80)
print(final.decode('ascii'))
This ultimately produces: Vm0wd2QyUXlVWGxWV0d4V1YwZDRWMVl3WkRSV01WbDNXa1JTVjAxV2JETlhhMUpUVmpBeFYySkVUbGhoProbably not a very useful trick outside of certain specific environments
JWT does it as well.
Even in this example, they are double base64 encoding strings (the salt).
It's really too bad that there's really nothing quite like json. Everything speaks it and can write it. It'd be nice if something like protobuf was easier to write and read in a schemeless fashion.
asn.1 is super nice -- everything speaks it and tooling is just great (runs away and hides)
The purpose of Base64 is to encode data—especially binary data—into a limited set of ASCII characters to allow transmission over text-based protocols.
It is not a cryptographic library nor an obfuscation tool.
Avoid encoding sensitive data using Base64 or include sensitive data in your JWT payload unless it is encrypted first.
And of course text-based things themselves are quite wasteful.
And before "space is cheap": JWT is used in contexts where space is generally not cheap, such as in HTTP headers.
You have to ask the question "why are we encoding this as base64 in the first place?"
The answer to that is generally that base64 plays nice with http headers. It has no newlines or special characters that need special handling. Then you ask "why encode json" And the answer is "because JSON is easy to handle". Then you ask the question "why embed a base64 field in the json?" And the answer is "Json doesn't handle binary data".
These are all choices that ultimately create a much larger text blob than needs be. And because this blob is being used for security purposes, it gets forwarded onto the request headers for every request. Now your simple "DELETE foo/bar" endpoint ends up requiring a 10kb header of security data just to make the request. Or if you are doing http2, then it means your LB will end up storing that 10kb blob for every connected client.
Just wasteful. Especially since it's a total of about 3 or 4 different fields with relatively fixed sizes. It could have been base64(key_length(1byte)|iterations(4bytes)|hash_function(1byte)|salt(32bytes)) Which would have produced something like a 51 byte base64 string. The example is 3x that size (156 characters). It gets much worse than that on real systems I've seen.
messagepack/cbor are very similar to json (schemaless, similar primitive types) but can support binary data. bson is another similar alternative. All three have implementations available in many languages, and have been used in big mature projects.
If you just want a generic, binary, hierarchical type-length-value encoding, have you considered https://en.wikipedia.org/wiki/Interchange_File_Format ?
It's not that there are widely-supported IFF libraries, per se; but rather that the format is so simple that as long as your language has a byte-array type, you can code a bug-free IFF encoder/decoder in said language about five minutes.
(And this is why there are no generic IFF metaformat libraries, ala JSON or XML libraries; it's "too simple to bother everyone depending on my library with a transitive dependency", so everyone just implements IFF encoding/decoding as part of the parser + generator for their IFF-based concrete file format.)
What's IFF used in? AIFF; RIFF (and therefore WAV, AVI, ANI, and — perhaps surprisingly — WebP); JPEG2000; PNG [with tweaks]...
• There's also a descendant metaformat, the ISO Base Media File Format ("BMFF"), which in turn means that MP4, MOV, and HEIF/HEIC can all be parsed by a generic IFF parser (though you'll miss breaking some per-leaf-chunk metadata fields out from the chunk body if you don't use a BMFF-specific parser.)
• And, as an alternative, there's https://en.wikipedia.org/wiki/Extensible_Binary_Meta_Languag... ("EBML"), which is basically IFF but with varint-encoding of the "type" and "length" parts of TLV (see https://matroska-org.github.io/libebml/specs.html). This is mostly currently used as the metaformat of the Matroska (MKV) format. It's also just complex enough to have a standalone generic codec library (https://github.com/Matroska-Org/libebml).
My personal recommendation, if you have some structured binary data to dump to disk, is to just hand-generate IFF chunks inline in your dump/export/send logic, the same way one would e.g. hand-emit CSV inline in a printf call. Just say "this is an IFF-based format" or put an .iff extension on it or send it as application/x-iff, and an ecosystem should be able to run with that. (And just like with JSON, if you give the IFF chunks descriptive names, people will probably be able to suss out what the chunks "mean" from context, without any kind of schema docs being necessary.)
I got grief for saying that I prefer TLV data over textual data (even if the data is text) because of how easy it is to write code to output and ingest this format, and it is way, WAY faster than JSON will ever be.
It really is a very easy way to get much faster transmission of data over the wire than JSON, and it's dead easy to write viewers for. It's just an underrated way to store binary data. storing things as binary is underrated in general.
"eeey bruh, open the the API it's me"
Actual RSA oid is somewhere in the middle.
`eY` could be any JSON, but it's most likely going to be a JWT.
Neither is a perfect signal, but contextually is more likely correct than not.
I work with this stuff often enough to recognize something that looks like a key or a hash. I don't work with it often enough to have picked up `ey` and `LS`.
The PEM format (that begins with `-----BEGIN [CERTIFICATE|CERTIFICATE REQUEST|PRIVATE KEY|X509 CRL|PUBLIC KEY]-----`) is already Base64 within the body.. the header and footer are ASCII, and shouldn't be encoded[0] (there's no link to the claim so perhaps there's another format similar to PEM?)
You can't spot private keys, unless they start with a repeating text sequence (or use the PEM format with header also encoded).
In practice, you will spot fully b64 encoded PEMs all the time once you have Kubernetes in play... create a Secret from a file and that's what you will find.
Spending hours wrangling sendmail.cf, and finally succeeding, felt like a genuine accomplishment.
Nowadays, things just work, mostly. How boring.
I recently installed Tru64 UNIX on a DEC Alpha I got off eBay. I felt like it was more sluggish than it should be, so I looked around at man-Pages about the VM (virtual memory, not virtual machine) subsystem, and was amazed how cleanly and detailed it was described, and what insights I could get about its state. The sys_attrs_vm man-page alone, which just describes every VM-layer tunable, gave a pretty good description of what the VM subsystem does, how each of those tunables affects it, and why you might want to change it.
Nowadays, things are massively complex, underdocumented (or just undocumented), constantly changing, and often inconsistent between sub-parts. Despite thinking that I have both wide and deep knowledge (I'm a low-level code kernel dev), it often takes me ages to figure out the root cause of sometimes even simple problems.
When one of my tests crashed one of those unprotected mainframes, two guys who were then close to my age now stared at an EBCDIC core dump, one of them slowly hitting page down, one Matrix-like screen after another, until they both jabbed at the screen and shouted "THERE!" simultaneously.
(One of them hand delivered the first WATFOR compiler to Yorktown, returning from Waterloo with a car full of tapes. I have thought of him - and this "THERE!" moment - every time I have come across the old saw about the bandwidth of a station wagon.)
It doesn’t even need to be much better than ROT13. Security by obscurity is good for this situation.
$ echo '{"' | base64
Vs
$ echo "{\"" | base64
https://altcodeunicode.com/ascii-american-standard-code-for-...
The first nibble (hex digit) shows your position within the chart, approximately like 2 = punctuation, 3 = digits, 4 = uppercase letters, 6 = lowercase letters. (Yes, there's more structure than that considering it in binary.)
For digits (first nibble 3), the value of the digit is equal to the value of the second nibble.
For punctuation (first nibble 2), the punctuation is the character you'd get on a traditional U.S. keyboard layout pressing shift and the digit of the second nibble.
For uppercase letters (first nibble 4, then overflowing into first nibble 5), the second nibble is the ordinal position of the letter within the alphabet. So 41 = A (letter #1), 42 = B (letter #2), 43 = C (letter #3).
Lowercase letters do the same thing starting at 6, so 61 = a (letter #1), 62 = b (letter #2), 63 = c (letter #3), etc.
The tricky ones are the overflow/wraparound into first nibble 5 (the letters from letter #16, P) and into first nibble 7 (from letter #16, p). There you have to actually add 16 to the letter position before combining it with the second nibble, or think of it as like "letter #0x10, letter #0x11, letter #0x12..." which may be less intuitive for some people).
Again, there's even more structure and pattern than that in ASCII, and it's all fully intentional, largely to facilitate meaningful bit manipulations. E.g. converting uppercase to lowercase is just a matter of adding 32, or logical OR with 0x00100000. Converting lowercase to uppercase is just a matter of subtracting 32, or logical AND with 0x11011111.
For reading hex dumps of ASCII, it's also helpful to know that the very first printable character (0x20) is, ironically, blank -- it's the space character.
0 1 2 3 4 5 6 7 8 9 A B C D E F
..
2 ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | } ~
I don't have a mnemonic for punctuation characters with second nibble >9, or for the backtick. The @ can be remembered via Ctrl+@ which is a way of typing the NUL character, ASCII 00 (also not coincidental; compare to Ctrl+A, Ctrl+B, Ctrl+C... for inputting ASCII 01, 02, 03...).I use a different layout so I'd never realised there was method to the madness! I get the following
$ echo -n ' !@#$%^&*(' | xxd -p 2021402324255e262a28
https://en.wikipedia.org/wiki/File:Remington_2_typewriter_ke...
I forget the story about what changed for shift-6 through shift-9.
When I say "traditional U.S. keyboard layout" I mean to contrast this with the modern one, which is the same as what you and I have.
Good times.
{" is ASCII 01111011, 00100010
Base64 takes 3 bytes x 8 bits = 24 bits, groups that 24 bit-sequence into four parts of 6 bits each, and then converts each to a number between 0-63. If there aren't enough bits (we only have 2 bytes = 16 bits, we need 18 bits), pad them with 0. Of course in reality the last 2 bits would be taken from the 3rd character of the JSON string, which is variable.
The first 6 bits are 011110, which in decimal is 30.
The second 6 bits are 110010, which in decimal is 50.
The last 4 bits are 0010. Pad it with 00 and you get 001000, which is 8.
Using an encoding table (https://base64.guru/learn/base64-characters), 30 is e, 50 is y and 8 is I. There's your "ey".
Funny how CS people are so incurious now, this blog post touches the surface but didn't get into the explanation.
https://web.cs.ucdavis.edu/~rogaway/classes/188/materials/th...
I've been doing this a long time but until today the only one I'd noticed was "MII".
> I did a few tests in my terminal, and he was right!
He clearly had no clue how base64 worked. You don’t need a test, if you know it.
> As pointed out by gnabgib and athorax on Hacker News, this actually detects the leading dashes of the PEM format
They needed help for this. I’m not sure that they opened Wikipedia at last to understand how base64 works even now. The whole article has an “it’s magic!” vibe.
They could just as easily have felt the underlying reason was so obvious it wasn’t worth mentioning.
I know how base64 encoding works but had never noticed the pattern the author pointed out. As soon as read it, I ubderstood why. It didn’t occur to me that the author should have explained it at a deeper level.
was incredulous but gave it a go, and it worked!!
Even if you don't notice the ey specifically the string itself just screams base64 encoding, regardless of what's actually inside.One blog post is hardly enough to just someone as ignorant but after quick look at the author's writing/coding/job history, I doubt he is that either.
I think it's fantastic that you can look at a string and feel it's base64 essence come through without a decoder. Thinking about it for a minute, I suspect I could train myself to do the same. If someone who already knew how to do it well wrote a how-to, I bet it would hit the front page and inspire many people, just like this article did.
I just don't get the urge to dump on the original author for sharing a new-to-him insight.
I know eyJhbG by heart
These blocks can be considered independent of each other. So for example, with the string "Hello world", you can do the following base64 transformations:
* "Hel" -> "SGVs"
* "lo " -> "bG8g"
* "wor" -> "d29y"
* "ld" -> "bGQ="
These encoded blocks can then be concatenated together and you have your final encoded string: "SGVsbG8gd29ybGQ="
(Notice that the last one ends in an equals sign. This is because the input is less than 3 characters, and so in order to produce 4 characters of output, it has to apply padding - part of which is encoded in the third digit as well.)
It's important to note that this is simply a byproduct of the way that base64 works, not actually an intended thing. My understanding is that it's basically like how if you take an ASCII character - which could be considered a base 256 digit - and convert it to hexadecimal (base 16), the resulting hex number will always be two digits long - the same two digits, at that - even if the original was part of a larger string.
In this case, every three base 256 digits will convert to four base 64 digits, in the same way that it would convert to six base 16 digits.
Besides that, I just spent way too much time figuring out this is an encrypted OpenTofu state. It just looked way too much like a terraform state but not entirely. Tells ya what I spend a lot of time with at work.
This is probably another interesting situation in which you cannot read the state, but you can observe changes and growth by observing the ciphertext. It's probably fine, but remains interesting.
Is this the state of modern understanding of basic primitives?
Also, it seem like the really important point is kind of glossed over. Base64 is not a kind of encryption, it's an encoding that anybody can easily decode. Using it to hide secrets in a GitHub repo is a really really dumb thing to do.
It was amazing to see him decode VISA and MASTER transactions on the fly in logs and other places.
Edit: Now that I looked at it a little deeper, i'm assuming they are talking about these[0] sort of files?
- `R0lGOD` - GIF files
- `iVBOR` - PNG files
- `/9j/` - JPG files
- `eyJ` - JSON
- `PD94` - XML
- `MII` - ASN.1 file, such as a certificate or private key
These are nice to know since they show up pretty frequently (images in data: URLs, JSON/XML/ASN.1 in various protocols).
It makes more sense to transmit binary formats in binary.
You would save bandwidth, memory and a decoding step.
Then you could also inspect the header bytes, instead of memorizing how they present in some intermediate encoding.
i wrote a glTF model converter once. 99% of those millions of JSON files I wrote were base64 encoded binary data.
a single glTF model sometimes wants to be two files on disk. one for the JSON and one for the binary data, and you use the JSON to describe where in the binary data the vertices are defined, and other windows for the various other bits like the triangles, triangle fans, textures, and other stuff are stored. But you can also base64 encode that data and put it in the JSON file and not have a messy double-file model. so that's what I did and I hated it. but it still felt better than having .gltf files and .bin files which together made up a single model file.
delecti•6mo ago
morkalork•6mo ago