You know, to ensure cordiality in any of the various riveting PRs and discussions.
This sort of thing is not just a funny question, it's something you think about when you're writing scanners. For instance, another "biggest possible file" is the zip file that decompresses to itself[1], which is in some sense also an infinite file. Many a scanner has been written that will fill the disk then crash if presented with that file, which is actually more pathological behavior than would be experienced if the scanner isn't there.
<link rel="icon" href="">
<link
rel="shortcut icon"
href='data:image/svg+xml,%3csvg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100">%3ccircle cx="25" cy="50" r="20"/>%3ccircle cx="75" cy="50" r="20"/>%3c/svg>'
/>
<link rel=icon href=data:>
With the bonus you've probably already remembered how to reconstruct this on demand just by reading this comment. It is "invalid" data but so is your example on Safari and Firefox instead of Chromium based browsers. It doesn't matter as much because that problem is local and silent in the logs, unlike the request.However it's pretty bad on narrow screens. I wish there was some progressive enhancement via modern CSS, or at least just dark mode.
The most brilliant way to screw all Python developers I’ve ever seen.
Later learnt that the docker container run the code as root, so basically you could destroy the platform from within. Good times.
However, I can't put my finger on what the correct rule would be.
By that measure, there are also 1 byte valid Python programs (e.g. "1").
$ find . -name ".git" -prune -o -name "README.md" -prune -o -type f -print | wc -l
137
$ find . -name ".git" -prune -o -name "README.md" -prune -o -type f -empty -print | wc -l
31
I suppose if you wanted minimal, non-empty examples, you'd end up with a "hello, world" collection, of which there are many, but nice that this handles file formats as well as programming languages.1: If you try to run a program binary from a bourne-like shell and execl() signals ENOEXEC, then (if it believes it to be a text file) it will try to run it as a shell script; this makes shebangs optional for programs executed only from a shell. You can try it yourself (tested on bash, dash, ksh, fish, zsh, and osh):
$ echo 'echo hi' > foo.sh
$ chmod +x foo.sh
$ ./foo.sh
$ for i in 3 4 5; do f=puzzle.$i; echo $f: $(head -1 $f | wc -c); tail -$((i-1)) $f; ./$f; done
puzzle.3: 1
futz
futz
./puzzle.3: line 3: futz: command not found
puzzle.4: 1
futz
futz
futz
./puzzle.4: line 4: futz: command not found
puzzle.5: 1
futz
futz
futz
futz
./puzzle.5: line 5: futz: command not found
Does this count?
RandallBrown•1d ago
eru•1d ago
JimDabell•1d ago
currysausage•1d ago
[1] https://en.wikipedia.org/wiki/Standard_Generalized_Markup_La...
[2] https://www.w3.org/TR/html401/conform.html#h-4.2
JimDabell•1d ago
> This appendix summarizes design guidelines for authors who wish their XHTML documents to render on existing HTML user agents.
— https://www.w3.org/TR/xhtml1/#guidelines
And RFC 2854, which defines the text/html media type, explicitly states this is permissible to label as text/html:
> The text/html media type is now defined by W3C Recommendations; the latest published version is [HTML401]. In addition, [XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01 and which may also be labeled as text/html.
— https://datatracker.ietf.org/doc/html/rfc2854#section-2
However even browsers that support XHTML rendering use their HTML parser for XHTML 1.0 documents served as text/html, even though they should really be parsing them as XHTML 1.0.
But yes, that extra slash means something entirely different to the SGML formulation of HTML (HTML 2.0 to HTML 4.01). HTML5 ditched SGML though, so SHORTTAG NET is no longer a thing.
currysausage•17h ago
[XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01
is technically incorrect. While the XHTML 1 compatibility profile was compatible with HTML 4 as implemented by major browsers, that wasn't actually HTML 4. HTML 4 is based on SGML, while what was implemented was a combination of HTML 4 semantics with the tagsoup parsing rules that browsers organically developed. These rules were only later formalized as part of HTML 5.
The compatibility guidelines do recommend a space between <br and />, but (at least according to https://validator.w3.org/ in HTML 4 mode) this doesn't change anything about <br /> being a NET-enabling start-tag <br /, followed by a greather-than sign.
Enter this:
and select "Validate HTML fragment", "HTML 4.01", and "Show Outline". This is the result: (Obviously nitpicking, but that's my point: the nitpickers can be out-nitpicked.)JimDabell•4h ago
Elsewhere in the thread, I posted an example of SHORTTAG NET being removed from a browser to enable parsing of XHTML documents:
https://github.com/emacsmirror/w3/commit/68af7c107dcbe194e30...
Nevertheless, the text/html RFC explicitly condones Appendix C, so despite it not being fully reflective of reality, it’s still permissible to use text/html to label XHTML 1.0 documents that follow Appendix C :D
myfonj•22h ago
I can probably confirm that "relevant" part of this claim for the times spanning from the first decade of 2000s, but I still desperately (in a way) seek information whether ANY even niche and obscure application that consumed "HTML" treated the NET as specified back then. I am quite certain W3C Validator did (that Mathias' article proves that, after all) and that Amaya might have do that, since it was a reference implementation from the same spec body, IIRC, but cannot swear on that.
Have anybody here have a clearer recollection of that times, or even some evidence?
I still find it strange such feature had such prominent space in the specs back then, but practically nowhere else.
JimDabell•21h ago
This is the earliest reference I could locate easily, from the www-html mailing list:
https://lists.w3.org/Archives/Public/www-html/2002Nov/0057.h...
You’ll be able to find more if you go trawling through USENET archives of places like comp.infosystems.www.authoring.html from 25–30 years ago, but it was a fairly niche subject even back then.
I think there were a couple of other niche tools that supported it, but I don’t remember the details after all this time.
JimDabell•20h ago
https://github.com/emacsmirror/w3/commit/68af7c107dcbe194e30...
myfonj•20h ago
I'd even say that from a glance, EMACS ("W3" browser in it) seems like possibly hugely relevant application, actually. Will look into it.
JimDabell•20h ago
https://browsers.evolt.org
It‘s got over a hundred ancient web browsers. I suspect none of them support SHORTTAG NET though.
myfonj•19h ago
jerf•18h ago
Speaking from my personal experience, if your idea of "valid HTML" was created in the late 1990s or early 2000s, it's worth a spin through the current HTML standard. HTML has always de facto been permissive, but de jure it had certain requirements. However, HTML 5 essentially works by reifying a very, very well-specified algorithm for how to handle HTML "loosely" (even though it is very strictly specified), and then refactors away effectively every requirement it possibly can and defers them to that algorithm instead.
Technically speaking, as long as you put down the correct doctype, you can elide almost anything nowadays and get a functional document; for instance, "<!DOCTYPE html><title>Hello</title>" is fully standards compliant now (push it through [1]). Only thing the validator gives is a warning that you might like to specify a language in the doctype. It isn't just "browsers will pretty much do the 'right thing'" with that, which has been true for a long time... that's actually standards-compliant HTML now.
What a lot of old hands don't understand is that HTML 5 was a seismic shift in how HTML is specified. Instead of specifying a rigid language and then pretending the world is complying and it's super naughty of them not to, it defines a standard for extracting a DOM tree from effectively any soup of characters you can throw at it, compliance is loosened as much as is practical, and even when things don't comply there's a specification on exactly how to pick up the pieces. HTML 5 has a completely different philosophy than HTML 4 and before.
(Relatedly, the answer to the frequently-asked question "What is the BeautifulSoup equivalent for $LANGUAGE", at least as far as parsing, is effectively now "Find an HTML 5-compliant parser", which they all have now. Beautiful Soup's parsing philosophy was enshrined into the standard.)
[1]: https://validator.w3.org/nu/#textarea
JimDabell•4h ago
> <!DOCTYPE html><title>Hello</title>" is fully standards compliant now
Sure, but switch the doctype and put a <p> on the end, and it’s fully standards compliant HTML 4.01 Strict too. And yet so many people are adamant that it can’t be. That it’s invalid (even though a validator says it’s valid). That it’s relying on error handling (even though the spec. says otherwise). That some browsers parse it wrong (but they can never name one). That the DOM ends up broken (when browser dev tools show a normal DOM). That you need <html> and <body> elements (even though it already has both). That there’s something wrong with it at a technical level (even though they cannot describe what).
The concept “This is correct HTML that works everywhere with no error handling” is very difficult for some people to grasp, to a genuinely surprising degree.
arexxbifs•1d ago