The Princeton INTERCAL Compiler's source code

https://esoteric.codes/blog/published-for-the-first-time-the-original-intercal72-compiler-code

140•surprisetalk•1d ago

Comments

entrepy123•1d ago

The INTERCAL satirical language (the original compiler's source code of which is announced as recovered in TFA) was covered in a couple of episodes [0, 1] of the (highly recommended, if I may say) Advent of Computing podcast [2].

  [0] https://adventofcomputing.libsyn.com/website/episode-78-intercal-and-esoterica
  [1] https://adventofcomputing.libsyn.com/website/episode-158-intercal-rides-again-restoring-a-lost-compiler
  [2] https://adventofcomputing.com

lukasb•1d ago

I always associated INTERCAL with COME FROM but it turns out that was "invented" separately (although its first implementation was in C-INTERCAL.)

MangoToupe•1d ago

I thought come-from was an early scheme primitive. It's pretty trivial to implement and it's a fun trick.

Edit: i am very wrong. i also had no clue it was meant as a joke. I just figured someone had a use case.

vnorilo•1d ago

The recent C# feature called interceptors [1] pretty much looks like comefrom from where I stand. Yet everyone talking about it has either been serious, or very good at trolling.

1: https://khalidabuhakmeh.com/dotnet-8-interceptors

pjmlp•1d ago

I am completly against them, I think they have re-invented Microsoft Fakes, and PostSharp, only badly.

skissane•1d ago

From your link:

[InterceptsLocation("/Users/khalidabuhakmeh/RiderProjects/ConsoleApp12/ConsoleApp12/Program.cs", line: 3, character: 3)]

they added a language feature which is sensitive to precise line/character offsets in your source code, so the tiniest change to the source code invalidates your code…

I’m speechless. Whatever they are aiming to achieve here, surely there is a more elegant, less ugly way

poizan42•1d ago

You are not supposed to use interceptors in code you write yourself. The feature exists for Roslyn Source Generators that runs every time you build the code.

ninkendo•1d ago

I’m still confused though, if you’re generating the code anyway, why do you need an interceptor? Can’t you just generate the code to match what you want to redirect to, directly inline?

poizan42•1d ago

Yes if all the code was generated. The problem is when you want to modify the behavior of user-supplied code - Roslyn Source Generators are additive so you cannot make modifications directly to user-supplied code.

You can read about how they work here: https://github.com/dotnet/roslyn/blob/main/docs/features/inc...

Basically they get the files (and ambient metadata) that are part of the compilation, filter to the parts it depends on, transforms to a in-mem representation of the data needed for the code generation, and then finally adds new files to the compilation. Since they can only add new files they cannot e.g. add code to be executed before or after user code is executed like with AOP. Interceptors are a solution to that problem.

ninkendo•1d ago

Interesting, so you’re saying generated code can change the behavior of user code with no indication this is happening from directly reading the user code… that sounds pretty horrifying. I guess AOP in general is pretty horrifying to me though. Maybe it’s useful if you restrict its use to very specific things like logging or something.

poizan42•1d ago

Well yes, hopefully you know what you are doing when you reference a source generator. This could ofc. also be done with custom msbuild task that modfies the code sent to the compiler or the assembly after compilation (like Fody), Source Generators just makes the process more streamlined and integrates with things like IntelliSense.

ninkendo•1d ago

> Well yes, hopefully you know what you are doing when you reference a source generator

I don't think there's much that's scary about generating source code in general. If it's self-contained and you have to actually call the generated code to use it, it's not really much different than any other code. But the idea of having code A change the behavior of code B is what's horrifying, regardless of whether code A is generated or not. If I'm reading code B I want to be able to reason about what I see without having to worry about some spooky action at a distance coming from somewhere else.

jayd16•1d ago

> that sounds pretty horrifying.

Things are constantly doing this. Frameworks use reflection or markup or all other kinds of things that count as magic if you don't bother to understand what's going on.

anyfoo•1d ago

COBOL has an almost-COME-FROM, its called “ALTER” and it changes the destination of a GO TO, but it’s “discouraged” nowadays.

skissane•1d ago

Historically, COBOL has had some rather fascinatingly unusual features-my favourite is OPEN INPUT REVERSED, which opens files for reading backwards-not sure if this was ever in any of the COBOL standards, but it exists in IBM mainframe COBOL, and many implementations on other platforms have copied it from there.

It usually only works for files with fixed length records, and reads the records in reverse sequential order, but the bits/bytes within the record in forward order. In theory, it could be made to work for variable-length record files as well, but I’m not aware of any implementation which does.

The original motivation was to support reading magnetic tapes backwards, so you could write data to a tape, then read it back in without a time-consuming rewind - which was important in the early years of computing, when memory and disk sizes were so small, magnetic tapes were commonly used for temporary storage / work files.

Most tape drives nowadays don’t support reading tapes backwards, even though there are standard SCSI commands defined to do so-but I believe IBM 3592 tape drive series still supports this, as do virtual tape servers targeted at mainframes. The drive physically reads the bits off the tape in reverse order, but then reverses the data before sending it to the computer.

I’m not aware of any other language which supports reading files backwards, other than COBOL. Well, mainframe assembly does, and you can invoke the relevant mainframe IO calls from languages such as C-but COBOL is the only language I know of which has it as a language feature.

Sesse__•1d ago

Many languages have exceptions, which are essentially COME FROM.

Duanemclemore•1d ago

Probably my single favorite podcast episode ever is the Future of Coding's episode on INTERCAL [0]. At least I think they tried to make it about INTERCAL.

No matter what - a consideration befitting the subject.

[0]https://futureofcoding.org/episodes/064.html

ghssds•1d ago

They call it an esoteric language but event-driven programming is basically based on "COME FROM" structures. Shell also extensively uses PLEASE-like fonctionality in the form of SUDO. The syntax of INTERCAL is very clean if we compare it to, say, regex.

anyfoo•1d ago

Not sure about regex. Its syntax is, by definition, regular. (Okay, most regex engines people use aren’t technically fully regular languages anymore, but that just makes things more convenient.)

eru•1d ago

> Its syntax is, by definition, regular.

No, not at all. Regexes describe regular languages, but that doesn't mean that their own syntax needs to be a regular language.

Eg many regexes allow parens, so you can have both 'ab|c' and 'a(b|c)'. But if you require your parens to be balanced, that's no longer a regular language.

skissane•1d ago

> But if you require your parens to be balanced, that's no longer a regular language

From a certain pedantic perspective, it still is a regular language, since there is a finite (implementation dependent) upper bound on nesting depth, and balanced parens with a finite upper bound of nesting is regular.

OTOH, one can appeal to the linguistic distinction between competence and performance, and say that regexes are irregular in pure competence, even though all (practically usable) languages turn out to be regular once we constrain them by performance as well.

eru•1d ago

> From a certain pedantic perspective, it still is a regular language, since there is a finite (implementation dependent) upper bound on nesting depth, and balanced parens with a finite upper bound of nesting is regular.

That's a silly point, because it makes all languages trivially finite, even monsters like C++. (Finity is even more of a restriction than being regular.)

skissane•1d ago

Is it silly? Well, if a person feels drawn to an ultrafinist philosophy of mathematics, they may feel it is important to emphasise the inherent finitude of all creations of inherently finite human minds, formalisms such as programming languages included.

eru•15h ago

Oh, finitism is perfectly fine by me. I have a lot of sympathy for it!

What is silly is not finitism (or ultrafinitism), but trying to apply concepts like 'regular language' or 'context free language' without modification in such a setting.

I suspect you should be able to suitably alter most of these concepts to make sense in the strictly finite setting. But they are going to become a lot more complicated.

For a similar flavour, you can also check out the differences and similarities between differential equations and difference equations.

skissane•14h ago

Right, but the “finite setting” is the real world. Yes, if we talk about C++ as a mathematical abstraction, maybe a C++ program containing a googolplex tokens could be in a theoretical sense syntactically and semantically valid. But such a program can’t exist in the real world

Let’s say a compiler has a (very plausible) implementation restriction that the source code of a program can be no longer than 2^32 bytes. Then, even if the abstract grammar is (e.g.) an unbounded context-free language, the actually accepted language has a finite upper bound on length-and it is well-known that applying a length constraint turns a context-free language into a regular language. This doesn’t require us to “to suitably alter most of these concepts to make sense in the strictly finite setting”, because formal language theory in an infinite setting permits constraining languages to be finite, and we have a very good understanding of what that the results of that are - this doesn’t require any new or different math, just applying the same math.

Now, it is true that, we will say that certain algorithms only work for regular languages, not context-free - but once we impose a finite bound on length, those algorithms do actually work in principle. In practice, of course, they are likely impractically slow - but the formalism we are talking about (the Chomsky hierarchy) is based on computability (can it answer the question in a finite but unbounded amount of time), not asymptotic computational complexity (nor real-world performance, which isn’t the same thing, as e.g. galactic algorithms demonstrate)

eru•6h ago

And those formalisms all become silly. Computability also becomes trivial, if you only have a finite amount of memory: either your program halts after a finite amount of time, or it enters a loop after at most 2^(number of bits in your memory) steps.

anyfoo•19h ago

You're right!

eterm•19h ago

> if you require your parens to be balanced, that's no longer a regular language.

TIL: https://en.wikipedia.org/wiki/Pumping_lemma_for_regular_lang...

As someone who has only recently learned the formal definition of regular language ( Thanks to https://www.youtube.com/watch?v=9syvZr-9xwk as mentioned on this board recently ), I'm interested in the formal proof for this?

It feels intuitively true, but I haven't finished the course yet and therefore haven't come across it and can't yet reason well about it.

Is the problem then in trying to "encode" whether brackets are balanced, that we would need infinite "levels" of how many more brackets we've opened rather than closed, (i.e a stack) and that violates the finite nature of finite automota?

So I've googled it now, and I've found the result is due to the "Pumping lemma": https://en.wikipedia.org/wiki/Pumping_lemma_for_regular_lang...

gbacon•16h ago

Pulling on the thread of your intuition about needing infinite levels, a regular language is one that can be recognized by a finite automaton, so requiring infinite memory or arbitrarily high counting suggests that the language is not regular. It’s not perfect; we prove it as below.

Assume for purpose of contraction that our language L of balanced parentheses is regular. Then L has pumping length p ≥ 1 such that every w ∈ L where |w| ≥ p can be decomposed w = xyz where |y| ≥ 1 and |xy| ≤ p. It is called the pumping lemma because we can “pump” the non-empty substring y either up or down, i.e., xy⁰z = xz and xyⁿz are also all in L. In formal languages, exponentiation notates character or string repetition.

We don’t know p and may not cherry pick it. Pretend you are playing a game against an adversary who gets to select arbitrary p ≥ 1, and then you use that p to choose w ∈ L to spoil the regular language party. With the language of balanced parentheses, no matter what p the adversary selects, you can force the leading xy prefix to contain all left-parentheses by choosing w = (ᵖ)ᵖ. The substring y cannot be empty, so pumping down gives xy⁰z = (ᵖ⁻ᵏ)ᵖ belongs to L, a contradiction.

One of my favorite test questions for automata theory classes is to have ChatGPT attempt a proof that a language is not regular and then have students correct the proof. You linked to a lecture by Michael Sipser, and the undergraduate version of the course I teach uses his excellent textbook.

https://www.amazon.com/Introduction-Theory-Computation-Micha...

His lecture on the pumping lemma for regular languages is

https://youtu.be/KAySmSEGc9U?t=1807

eterm•15h ago

I've just watched that video, thank you, and it's interesting that he shows an example where my intuition of counting leading to non-regularity fails:

We've seen you can't detect if there is a balanced number of 0's and 1's in a string of 0's and 1's with a FA/Regular expression, howver you CAN detect if there are a balanced number of "01" and "10" substrings in a string of 0's and 1's.

Proving that 01 and 10 balancing was left as an exercise to the reader, which I will try to work through here, hopefully I've not made a mistake. My intuition here says that it is equivalent to checking whether it starts and ends with the same symbol, because 01 leaves you having last read a 1, and therefore then requires a "10" substring to get back to having last read a 0, and vice-versa.

The FA I believe therefore can be constructed with transitions

    Q0 -> epsilon -> Q1
    Q0 -> epsilon -> Q3
    Q1 -> 0 -> Q2
    Q2 -> 1 -> Q1
    Q1 -> epsilon -> Q5
    Q3 -> 1 -> Q4
    Q4 -> 0 -> Q3
    Q3 -> epsilon -> Q5
    ( Starting Q0, Accepting Q5 )

Or presented differently, (0(0U1)0) U (1(0U1)1) U epsilon

xxmarkuski•1d ago

INTERCAL is presented at the Christmas Lecture in Programming at KIT

breakingcups•1d ago

"[...] to English, where the interpreter was once literally another person but is now downgraded to an AI prompt system."

az09mugen•22h ago

Since there is the source now, does someone know what quantity/proportion of "PLEASE" the developer needs to write ?

Because IIRC, if you don't put enough or too much "PLEASE", the compiler won't compile.

moritzwarhier•21h ago

Let's start with: I don't know. Also I can't read this language (BASIC variant?) and am browsing from a phone.

But this line at least outputs the error message:

https://github.com/rottytooth/INTERCAL72/blob/f94e0c8eaaf134...

edit: probably slightly complicated :D

but this makes me wonder if it's related to the source mutliplying some variable/register value by 3 in the error line:

> You will need at least 3 lines of code, with one PLEASE statement among them.

(from README)

Just searching "PLEASE" also shows increments and a comparison with a number written in hex... so it should be easy to figure out the intercal "please" halting problem instance, right? :)

az09mugen•20h ago

Thank you so much for the time you took answering that existential question of mine! (And maybe others)

moritzwarhier•19h ago

Thanks for your reply: apparently, sometimes questions as answers can be helpful :)

tasty_freeze•20h ago

I'm waiting for the self-hosting version of the compiler before I deploy it in my own projects.

Speeding up PyTorch cold start times

Don't McBlock Me

Physicality: The New Age of UI

Ask HN: What are some unexplored or poorly explored applications of Comp Vision?

Guilty by Algorithm

Jemalloc Archived (2005-2025)

I couldn't find a co-founder, so I built one

CRIF has a 'score' for almost everyone in Austria

DNS rebinding attacks explained: The lookup is coming from inside the house

Storybook 9

Why Is the US Dropping Billions of Mutant Flies from the Sky? [video]

Linux 6.16 Brings Many Laptop Driver Improvements, New Dasharo ACPI Driver

We built a tool to audit electric and water bills for overcharges

Reasoning Is Trained, Not Prompted

A toy debugger written in Rust

We turned public transit into a multiplayer game

Newfound Mechanism Rewires Cellular Energy Processing for Drastic Weight Loss

Ask HN: Who is using C?

What's Wrong with West Virginia's Bead Proposal?

About the OIDC Conformance Suite

M&S faces 'unprecedented' customer lawsuit over cyberattack data breach

Quantum Punks Manifesto (2024)

In Which I Make the Mistake of Covering an Episode of the All-In Podcast

Microsoft adds quick machine recovery to Windows 11 settings

How Ukraine truck FPV drone attack happened?

Secure Minions: private collaboration between Ollama and frontier models

Show HN: I made the bridge between chatbots and live gated data

UK Moving Ahead, Wants Input on ELoran Program

Denmark gets more serious about digital sovereignty

(On | No) Syntactic Support for Error Handling

Speeding up PyTorch cold start times

Don't McBlock Me

Physicality: The New Age of UI

Ask HN: What are some unexplored or poorly explored applications of Comp Vision?

Guilty by Algorithm

Jemalloc Archived (2005-2025)

I couldn't find a co-founder, so I built one

CRIF has a 'score' for almost everyone in Austria

DNS rebinding attacks explained: The lookup is coming from inside the house

Storybook 9

Why Is the US Dropping Billions of Mutant Flies from the Sky? [video]

Linux 6.16 Brings Many Laptop Driver Improvements, New Dasharo ACPI Driver

We built a tool to audit electric and water bills for overcharges

Reasoning Is Trained, Not Prompted

A toy debugger written in Rust

We turned public transit into a multiplayer game

Newfound Mechanism Rewires Cellular Energy Processing for Drastic Weight Loss

Ask HN: Who is using C?

What's Wrong with West Virginia's Bead Proposal?

About the OIDC Conformance Suite

M&S faces 'unprecedented' customer lawsuit over cyberattack data breach

Quantum Punks Manifesto (2024)

In Which I Make the Mistake of Covering an Episode of the All-In Podcast

Microsoft adds quick machine recovery to Windows 11 settings

How Ukraine truck FPV drone attack happened?

Secure Minions: private collaboration between Ollama and frontier models

Show HN: I made the bridge between chatbots and live gated data

UK Moving Ahead, Wants Input on ELoran Program

Denmark gets more serious about digital sovereignty

(On | No) Syntactic Support for Error Handling

The Princeton INTERCAL Compiler's source code

Comments