Give it time until we have true globally multi lingual models for superior context awareness.
I'd love to see some quantification of errors in q/kdb+ (or hebrew) vs. languages of similar size that are left-to-right.
From the perspective of a LLM learning from Unicode, this would appear as a delimeter that needs to be inserted on language direction boundaries; but everything else should work the same.
Everything is written sequentially in the sense that the character that is written first can only be followed by the character that is written next. In this sense writing non-sequentially is logically impossible.
Basically, the numbers 1234 and 4321 are identical assuming one is written left to right and the other is right to left. Then it's just a convention which way you are used to reading.
I know nothing of Old (or New) Hebrew unfortunately so I may be completely off base.
So, in the word “gif”, they would start writing the “f” first and finish writing the “i” first (just before writing the last part of the “f”. For “if”, writing the “f” would start before writing the “i” started and finish after writing the “i” finished.
In traditional printing “writing” can happen simultaneously for an entire page, but colour printing can make things more complex.
Exercise to the reader to guess how line breaks, text wrapping, and search algorithms worked.
I'm convinced that's the case. On any major LLM I can carpet bomb Java/Python boilerplate without issue. For Rust, at least last time I checked, it comes up with non-existing traits, more frequent hallucinations and general struggle to use the context effectively. In agent mode it turns into a first fight with the compiler, often ending in credit destroying loops.
And don't get me started when using it for Nix...
So not surprised about something with orders of magnitude smaller public corpus.
for nix, i is nice template engine to start or search. did not tried big nix changes.
Did your experiment consist of asking an LLM to design a programming language for itself?
There may be emergent abilities that arise in these models purely due to how much information they contain, but I'm unconvinced that their architecture allows them to crystallize actual understanding. E.g. I'm sceptical that there'd be an area in the LLM weights that encodes the logic behind arithmetic and gives rise to the model actually modelling arithmetic as opposed to just probabilistically saying that the text `1+1=` tended to be followed by the letter `2`.
1. Function application should be left to right, e.g. `sqrt 4`
2. Precedence order should be very simple. In k, everything has the same precedence order (with the exceptions of brackets)
1 + 2 forces you to have this right to left convention, annoyingly.
Fwiw, I think 2 is great and I would rather give up 1 than 2. However, writing function application as `my_fun arg` is a very strong convention.
But it's not a good idea to use regexes in code that you're going to use long term. It's justifiable for simple regexes, and many people go against this advice, but really for anything remotely complex regexes become totally unreadable and extremely bug prone. Complex regexes are a huge code smell and array languages are pretty much one enormous regex.
I wrote something like that in C# once [0] but I'm not getting the impression that there's a lot of demand for that kind of thing.
[0] https://share.unison-lang.org/@unison/website/code/main/late...
I think it's obvious that Cyrillic isn't any less usable than the Latin alphabet in any objective sense. In fact, I'm using English orthography, which has all kinds of unnecessary usability problems which aren't present in any Cyrillic orthography that I know of. But familiarity is a much stronger factor; even today I can barely sound out words in Russian or Ukrainian, while English text printed in Latin letters is clearer to me than speech.
On theoretical grounds, I suspect that the APL syntax Gabi is calling RL-NOP is less usable for left-to-right readers than at least LR-NOP and maybe even conventional Please Brutally Execute My Dear Aunt Sally operator precedence. But familiarity is such a strong force that this hypothesis is very difficult to test.
The theoretical grounds are that, when reading left to right, a reader must maintain a stack of pending operators and values in their mind, unless they are saved by parentheses. (The Iverson quote disagrees with this, but I think Iverson was wrong.) Maintaining mental stacks is difficult and error-prone; this is the reason for the Tim Peters proverb, "Flat is better than nested."
I suspect that operator precedence might be superior for two reasons:
1. It more often avoids parentheses, which are extra symbols to recognize and correctly pair up in your mind.
2. The meaning of high-precedence subexpressions like `x×b` are almost context-independent—although an exponentiation operator or something like a C struct field selector could still follow `b` and change its meaning, following multiplications, divisions, additions, subtractions, or comparisons will not, and preceding additions, subtractions, or comparisons also will not. I conjecture that this facilitates subconscious pattern recognition.
But the familiarity factor enormously outweighs these theoretical considerations for me.
On the contrary, I find it much more usable for left-to-right readers, because it allows a "top-down" reading of the expressions, instead of a "bottom-up" reading.
When trying to understand an unfamiliar program, for debugging or maintenance, you normally do not want to waste time by reading completely all expressions, which provide irrelevant computation details.
You typically search where some variables are modified and how and why. For this it is frequently enough to look only at the last operations that have been performed before storing a modified value into a variable.
With the Iverson notation, the last operations are always conveniently grouped at the left side of a text line. Thus you read from left to right only as much as necessary to find what you need, then you can skip the rest of the line.
With the school notation, the required information is not grouped at one end of the line, so reading becomes slower.
The opposite of the Iverson notation, which was used in some stack-oriented languages, also groups the information, but in a way that is less usable for left-to-right users.
From natural languages, left-to-right readers expect that a sentence starts with its topic (at the left side), i.e. the most important part, e.g. the last assignment, like in the Iverson notation, instead of ending with its topic, like in the opposite notation.
> "a reader must maintain a stack of pending operators and values in their mind"
I believe that few readers, if any, do this.
The normal case when reading is that you do not want to reproduce in your mind what the computer does, but only to find the information flows between program variables. For this, it is enough to read partial expressions, as explained above.
In the very rare case when you wanted to make a mental calculation identical to that of the computer, you would normally read the expression from right to left.
When writing, the Iverson notation is usually more convenient than the school notation, while writing normally, from left to right. The reason is that for most computations the natural way to find the expression that must be computed is to go backwards, from the desired result towards the available data.
Everybody learns in school the traditional convention for writing mathematical expressions.
It appears that for most people it is difficult or impossible to unlearn later such a convention, even if they encounter a superior convention.
On the other hand, I am among those fewer for which this is not true, so when I have first read the book "A Programming Language" of K. Iverson, on which the later APL language and its successors have been based, I have immediately recognized that the Iverson convention is much better than the school convention, and I have no trouble in using it.
When reading a program written with the Iverson convention, you still read from left to right, but you typically do not read until the end of the line, but only as much of the left part as necessary to understand the purpose of the line. (Because the right operand of any operator is everything that follows it until the end of the line, and the details of that computation may be irrelevant. With school notation, when searching where a variable has been modified and how, you must jump between the beginning of the line and the end of the line, to find the last operations that have generated the stored value, when reading and understanding the complete expression would be a waste of time.)
The original motivation of the Iverson convention, which remains very important, was to give a useful meaning for a sequence of identical non-commutative operators, e.g. subtraction and division. This is particularly desirable when the operators are used in vector reductions.
(With school notation, a0 - a1 - a2 - ... - an is seldom a useful expression, but with the Iverson convention it becomes alternate sum, which is needed very frequently. Similarly for division.)
It's actually not, and unless they in some way run a rule engine on top of their LLM SaaS stuff it seems far fetched to believe it adheres to rule sets in any way.
Local models confuse Python, Elixir, PHP and Bash when I've tried to use them for coding. They seem more stable for JS, but sometimes they slip out of that too.
Seems pretty contrived and desperate to invent transpilers from quasi-Python to other languages to try and find a software development use for LLM SaaS. Warnings about Lisp macros and other code rewrite tools ought to apply here as well. Plus, of course, the loss of 'notation as a tool of thought'.
My hot take is that Iverson was simply wrong about this. He couldn't be expected to predict code completion and then LLMs both wanting later tokens to depend on earlier tokens. SQL messed it up, too, with "from" not coming first. If APL were developed today, I think left-to-right evaluation would have been preferred. The popularity of dotted function calls in various languages makes it reasonably clear that people like tacking things onto the end and seeing a "pipeline" form from left to right.
The 10 by 10 reshaping of counting to 100
Your insight about APL being reverse-concatenative is very cool.
Makes me wonder if future llms will be composing nonlinear things and be able to work in non-token-order spaces temporarily, or will have a way to map their output back to linear token order. I know nonlinear thinking is common while writing code though. current llms might be hiding a deficit by having a large and perfect context window.
Shouldn't be hard to train a coding LLM to do this too by doubling the training time: train the LLM both forwards and backwards across the training data.
Is there a top 100 package that does something funny on import?
My guess is that nearly all packages that did this sort of thing were left behind in the 2-to-3 migration, which a lot of us used as the excuse for a clean break.
Usually when someone solves problems with q, they don't use the way one would for Python/Java/C/C++/C#/etc.
This is probably a poor example, if I asked someone to write a function to create an nxn identity matrix for a given number the non-q solution would probably involve some kind of nested loop that checks if i==j and assigns 1, otherwise assigns 0.
In q you'd still check equivalence, but instead of looping, you generate a list of numbers as long as the given dimension and then compare each item of the list to itself:
{x=/:x:til x}3
An LLM that's been so heavily trained on an imperative style will likely struggle to solve similar (and often more complex) problems in a standard q manner.some hacks for time / position/ space flipping the models:
- test spate of diffusion models emerging. pro is faster, con is smaller context, ymmv is if trained on that language &/or context large enough to ICL lang booster info
- exploit known LTL tricks that may work there’s bunch of these
- e.g., tell model to gen drafts in some sort RPN variant of lang, if tests tell it to simulate creating such a fork of this and then gen clean standard form at end
- have it be explicit about leapfrogging recall and reasoning, eg be excessively verbose with comments can regex strip later
- have it build a stack / combo of the RPN & COT & bootstrapping its own ICL
- exploit causal markers - think tags that can splinter time - this can really boost any of the above methods - eg give each instance of things disjoint time tags, A1 vs K37 for numbered instances of things that share a given space - like a time GUID
- use orthogonal groups of such tags to splinter time and space recall and reasoning in model, to include seemingly naive things like pass 1 etc
- our recent arXiv paper on HDRAM / hypertokens pushes causal markers to classic-quantum holographic extreme and was built for this, next version will be more accessible
- the motivators are simple - models fork on prefix-free modulo embedding noise, so the more you make prefix-free, the better the performance, there’s some massive caveats on how to do this perfectly which is exactly our precise work - think 2x to 10x gain on model and similar on reasoning, again ymmv as we update preprint, post second paper that makes baseline better, prep git release etc to make it tons easier to get better recall and exploit same to get better reasoning by making it possible for any model to do the equivalent of arbitrary RPN
- our future state is exactly this a prompt compiler for exactly this use case - explainable time-independent computation in any model
Language designers would be smart to recognize this fact and favor making their languages more LLM friendly. This should also make them more human friendly.
Most techies (generalizing here) start with a reasonably clear spec that needs to be implemented and they can focus on how to architect the code.
Research - whether science, finance or design - is much more iterative and freeform. Your objective is often very fuzzy. You might have a vague idea what you want, but having to think about code structure is annoying and orthogonal to your actual purpose.
This is why languages like Ruby work well for certain purposes. They allow the person to prototype extremely rapidly and iterate on the idea. It will eventually reach a breaking point where global state starts being an impediment, but an experienced dev will have started refactoring stuff earlier than that as various parts of the implementation becomes stable.
That says nothing about the language at all, actually. Just that it's small and easily confused for something more idiomatic to a newbie.
at first they were hilariously bad, then just bad, then kind of okay, and now anthropic's claude4opus reads and writes it just fine.
vessenes•4h ago
I wonder if diffusion models would be better at this; most start out as sequential token generators and then get finetuned.
jsemrau•4h ago