frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The AGI Final Frontier: The CLJ-AGI Benchmark

https://raspasov.posthaven.com/the-agi-final-frontier-the-clj-agi-benchmark
19•raspasov•6mo ago

Comments

malux85•6mo ago
Perhaps this is a really great AGI test - not in the sense that the AGI can complete the given task correctly, but if the AGI can interpret incredibly hand-wavy requirements with “do XXX (as much as possible)” and implement these: A,B,C etc
delegate•6mo ago
Doesn't Clojure already support all of those features ?

Eg.

> transducer-first design, laziness either eliminated or opt-in

You can write your code using transducers or opt-in for laziness in Clojure now. So it's a matter of choice of tools, rather than a feature of the language.

> protocols everywhere as much as practically possible (performance)

Again, it's a choice made by the programmer, the language already allows you to have protocols everywhere. It's also how Clojure is implemented under the hood.

-> first-class data structures/types are also CRDT data types, where practical (correctness and performance)

Most of the programs I worked on, did not require CRDT. I'm inclined to choose a library for this.

> first-class maps, vectors, arrays, sets, counters, and more

Isn't this the case already ? If Clojure's native data structures are not enough, there's the ocean of Java options..

Which leads to a very interesting question:

How should the 'real' AGI respond to your request ?

raspasov•6mo ago
> first-class maps, vectors, arrays, sets, counters, and more

That's my mistake; this line was intended to be a sub-bullet point of the previous line regarding CRDTs.

> the language already allows you to have protocols everywhere

The core data structures, for example, are not based on protocols; they are implemented in pure Java. One reason is that the 1.0 version of the language lacked protocols. All that being said, it remains an open question what the full implications of the protocol-first idea are.

> You can write your code using transducers or opt in for laziness in Clojure now. So it's a matter of choice of tools, rather than a feature of the language.

You 100% can. Unfortunately, many people don't. The first thing people learn is (map inc [1 2 3]), which produces a lazy sequence. Clojure would never change this behavior, as the authors value backward compatibility almost above everything else, and rightly so. A transducer-first approach would be a world where (map inc [1 2 3]) produces the vector [2 3 4] by default, for example.

This was mentioned by Rich Hickey himself in his "A History of Clojure" paper:

https://clojure.org/about/history https://dl.acm.org/doi/pdf/10.1145/3386321

(from paper) > "Clojure is an exercise in tool building and nothing more. I do wish I had thought of some things in a different order, especially transducers. I also wish I had thought of protocols sooner, so that more of Clojure’s abstractions could have been built atop them rather than Java interfaces."

mdemare•6mo ago
More AGI Final Frontiers:

"Reimplement Sid Meier's Alpha Centauri", but with modern graphics, smart AIs that role-play their personalities, all bugs fixed, a much better endgame, AI-generated unexpected events, and a dev console where you can mod the game via natural language instructions."

"Reimplement all linux command line utilities in Rust, make their names, arguments and options consistent, and fork all software and scripts on the internet to use the new versions."

raspasov•6mo ago
"Reimplement Linux in Rust" would be a good one!
glimshe•6mo ago
Let's say we had a ChatGPT-2000 capable of all of this. How would digital life look like? What people would do with their computers?
Lerc•6mo ago
Even if we were not past a hard takeoff point where AIs could decide for themselves what to work on, the things that would be created in all areas would be incredible.

Consider every time you played a game and thought it would be better if it had x,y, or z. Or you wished an application had this one simple nrw feature.

All those things would be possible to make. A lot of people will discover why their idea was a bad idea. Some will discover their idea was great, some will erroneously think their bad idea is great.

We will be inundated with the creation of those good and bad ideas. Some people will have ideas on how to manage that flood of new creations, and create tools to help out, some of those tools will be good and some of them will be bad, there will be a period of churn where finding the good and ignoring the bad is difficult, a badly made curator might make bad ideas linger.

That's just in the domain of games and applications. If AI could manage that level of complexity, you can ask it to develop and test just about any software idea you have.

I barely go a day without thinking of something that I could spend months of development time on.

Some idle thoughts that such a model could develop and test.

Can you make a transformer that instead of linear space V modifiers it instead used geodesics? Is it better? Would it better support scalable V values?

Can you train a model to identify which layer is the likely next layer purely based upon the input given to that layer? If it only occasionally gets it wrong does the model perform better if you give the input to the layer that the predictor thought was the next layer. Can you induce looping/skipping layers this way?

If you train a model with the layers in a round robin ordering on every input, do the layers regress to a mean generic layer form, or do they develop into a general information improver that works purely by the context of the input.

What if you did every layer on a round robin twice, so that every layer was guaranteed to be followed by any of the other layers at least once?

Given you can quadruple the parameters of a model without changing it's behavour using the Wn + Randomn, Wn - Randomn trick, can you distill a model To .25 size then quadruple to make a model to retain the size but takes further learning better, broadening parameter use.

Can any of these ideas be combined with the ones above?

Imagine instead of having these idle ideas, you could direct an AI to implement them and report back to you the results.

Even if 99.99% of the ideas are failures, there could be massive advances from the fraction that remains.

SequoiaHope•6mo ago
That’s still just code! How about “design a metal 3D printing machine which can be built for $2000 and can make titanium, steel, aluminum, and copper parts with 100 micron precision, then design a simple factory for that machine. Write the manufacturing programs for all of the CNC machines, and work instructions for every step of the process. Order the material and hire qualified individuals to operate the machines. Identify funding opportunities and raise funds.”

I could go on. One of the challenges here is that many things like this cannot be designed by simply thinking, unless you have extremely super human performance, because complex subassemblies have to be built and prototypes and debugged. And right now there’s no good datasets for machine design, PCB design, machine tool programming, hiring, VC fund raising, negotiating building leases, etc.

We will never have real AGI unless it can learn how to improve without extensive datasets.

kloud•6mo ago
My "pelican test" for coding LLMs now is creating a proof of concept building UIs (creating a hello world app) using Jetpack Compose in Clojure. Since Compose is implemented as Kotlin compiler extensions and does not provide Java APIs, it cannot be used from Clojure using interop.

I outlined a plan to let it analyze Compose code and suggest it can reverse engineer bytecode of Kotlin demo app first and emit bytecode from Clojure or implement in Clojure directly based on the analysis. Claude Code with Sonnet 4 was confident implementing directly and failed spectacularly.

Then as a follow-up I tried to let it compile Kotlin demo app and then tried to bundle those classes using clojure tooling to at least make sure it gets the dependencies right as the first step to start from. It resorted to cheating by shelling out to graddlew from clojure :) I am going to wait for next round of SOTA models to burn some tokens again.

Grimblewald•6mo ago
mine is seeing if they can implement brown et al (2007) image stitching algorithm. It's old, plenty of code examples exist in training data, the math at this stage is quite well developed, but funnily enough, no decent real open source examples of this exist, especially anything that gets close to Microsoft research's demo tool, the image composite editor (ICE). Even if you heavily constrain the requirements, i.e. planar motion only, only using multi band blending and gain correction, not a single model currently manages to pull this off. Few even have something working at the start. Many other things they excel at, even look downright competent, but in all those cases it simply turns out decent open source examples of the implementation exist on git-hub, usually a touch better than the LLM version. I have yet to see a LLM produce good code for something even moderately complex that I couldn't then find a copy of online.
upghost•6mo ago
This is a good one. Forget AGI, I'd settle for an LLM that when doing Clojure doesn't spew hot trash. Balancing parens on tab complete would be a nice start. Or writing sensible ClojureScript that isn't reskinned JavaScript with parens would be pretty stellar.
raspasov•6mo ago
Haha, the higher-end LLMs are not absolutely terrible. In my experience, LLMs in their current form are better at explaining code than creating it. Not perfect by any stretch in either task.

Balancing parens is still a challenge.

Lerc•6mo ago
The notion of when a language is created is open to interpretation.

It is not stated whether you want such a language described, specified, or implemented.

raspasov•6mo ago
I think "created" is generally considered to be implemented :).

I also discuss performance, so I think implementation is definitely strongly implied.

kelseyfrog•6mo ago
I get it now. Benchmarks, in the end, are prompts for AI researchers.

If you want a problem solved, translate it into an AGI benchmark.

With enough patience, it becomes something AI researchers report on, optimize for, and ultimately saturate. Months later, the solution arrives; all you had to do was wait. AI researchers are an informal, lossy form of distributed computation - they mass-produce solutions and tools that, almost inevitably, solve the messy problem you started with.

rs186•6mo ago
What comes to mind is whether AGI can gracefully solve the Go error handling problem, once and for all.

https://go.googlesource.com/proposal/+/master/design/go2draf...

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
372•klaussilveira•4h ago•79 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
737•xnx•10h ago•453 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
130•isitcontent•4h ago•13 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
106•dmpetrov•5h ago•48 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
233•vecti•7h ago•108 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
19•quibono•4d ago•0 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
300•aktau•11h ago•149 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
301•ostacke•10h ago•80 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
151•eljojo•7h ago•118 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
371•todsacerdoti•12h ago•214 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
42•phreda4•4h ago•7 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
300•lstoll•11h ago•224 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
98•vmatsiiako•9h ago•32 comments

A century of hair samples proves leaded gas ban worked

https://arstechnica.com/science/2026/02/a-century-of-hair-samples-proves-leaded-gas-ban-worked/
48•jnord•3d ago•3 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
165•i5heu•7h ago•121 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
134•limoce•3d ago•75 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
34•rescrv•12h ago•15 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
5•kmm•4d ago•0 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
222•surprisetalk•3d ago•29 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
950•cdrnsf•14h ago•409 comments

The Oklahoma Architect Who Turned Kitsch into Art

https://www.bloomberg.com/news/features/2026-01-31/oklahoma-architect-bruce-goff-s-wild-home-desi...
16•MarlonPro•3d ago•2 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
25•ray__•1h ago•4 comments

Claude Composer

https://www.josh.ing/blog/claude-composer
93•coloneltcb•2d ago•67 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
76•antves•1d ago•56 comments

Evaluating and mitigating the growing risk of LLM-discovered 0-days

https://red.anthropic.com/2026/zero-days/
31•lebovic•1d ago•10 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
36•nwparker•1d ago•7 comments

How virtual textures work

https://www.shlom.dev/articles/how-virtual-textures-really-work/
22•betamark•11h ago•22 comments

Evolution of car door handles over the decades

https://newatlas.com/automotive/evolution-car-door-handle/
38•andsoitis•3d ago•60 comments

The Beauty of Slag

https://mag.uchicago.edu/science-medicine/beauty-slag
26•sohkamyung•3d ago•3 comments

Planetary Roller Screws

https://www.humanityslastmachine.com/#planetary-roller-screws
33•everlier•3d ago•6 comments