frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Could ionospheric disturbances influence earthquakes?

https://www.kyoto-u.ac.jp/en/research-news/2026-02-06-0
1•geox•1m ago•0 comments

SpaceX's next astronaut launch for NASA is officially on for Feb. 11 as FAA clea

https://www.space.com/space-exploration/launches-spacecraft/spacexs-next-astronaut-launch-for-nas...
1•bookmtn•2m ago•0 comments

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
1•fainir•4m ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•5m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•7m ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
2•Brajeshwar•11m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
3•Brajeshwar•12m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
2•Brajeshwar•12m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•15m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•18m ago•1 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•19m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•19m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
2•vinhnx•20m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•25m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•29m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•33m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•35m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•36m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
5•okaywriting•42m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•45m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•46m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•47m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•48m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•48m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•48m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
4•pseudolus•49m ago•2 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•53m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•53m ago•1 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•54m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•54m ago•0 comments
Open in hackernews

Program-of-Thought Prompting Outperforms Chain-of-Thought by 15% (2022)

https://arxiv.org/abs/2211.12588
136•mkagenius•2mo ago

Comments

jhart99•2mo ago
Underlying paper is from 2022 and should be indicated in the title.
jey•2mo ago
This seems to be incorporated into current LLM generations already -- when code execution is enabled both GPT-5.x and Claude 4.x automatically seem to execute Python code to help with reasoning steps.
logicprog•2mo ago
Yeah, this is honestly one of the coolest developments of new models.
Esophagus4•2mo ago
Same with CoT prompting.

If you compare the outputs of a CoT input vs a control input, the outputs will have the reasoning step either way for the current generation of models.

fsfod•2mo ago
I remember seeing that GPT-5 had two python tools defined in its leaked prompt one them would hide the output from user visible chain of thought UI.
axiom92•2mo ago
This was integrated in gpt4 2 years ago:

https://www.reddit.com/r/ChatGPT/comments/14sqcg8/anyone_els...

eric-burel•2mo ago
I call that self-destructive prompting in the sense that you use AI to output programs that replace calling the AI in the future. The paper seems to indicate that this also brings much better results. However it's subject to attacks as running generated code is usually unsafe. A sandbox has to be used, major agentic AI players are providing some solutions, like Langchain sandbox released earlier this year.
samus•2mo ago
If the generated code uses a suitable programming language, like the safe subset of Haskell, then the risk is significantly lower. Anyway it makes sense to execute this code in the user's browser instead of on the server.
eric-burel•2mo ago
Yeah I mean you can replace sandboxing buy other safe alternatives but the idea is the same, the generated code has to be considered as 100% untrusted. Supply chain attacks are especially nasty.
mgraczyk•2mo ago
Anthropic recently added this to the API: https://www.anthropic.com/engineering/advanced-tool-use

See "Programmatic Tool Calling"

And there was an AI productivity startup called Lutra AI doing this, although they've since pivoted to some kind of MCP infra thing: https://lutra.ai/

axiom92•2mo ago
https://www.reddit.com/r/ChatGPT/comments/14sqcg8/anyone_els...
mgraczyk•2mo ago
That is different, it's a code interpreter where the model can run code and see the outputs. It's not a different way of doing tool calls
axiom92•2mo ago
And even before this work, there was "PAL: Program-aided Language Models" (https://arxiv.org/abs/2211.10435, https://reasonwithpal.com/).

Afaik PaLM (Google's OG big models) tried this trick, but it didn't work for them. I think it's because PaL used descriptive inline comments + meaningful variable names. Compare the following:

```python

# calculate the remaining apples

apples_left = apples_bought - apples_eaten

```

vs.

```python

x = y - z

```

We have ablations in https://arxiv.org/abs/2211.10435 showing that both are indeed useful (see "Crafting prompts for PAL").

mvkel•2mo ago
Worth noting: this paper was published three days before the release of GPT-3.5
robot-wrangler•2mo ago
Chain-of-code is better than chain-of-thought because it's more grounded, more specific, and achieves a lot of useful compression. But my bet is that the proposed program-of-thought is too specific. Moving all the way from "very fuzzy specification" to "very concrete code" skips all of the space in the middle, and now there's no room to iterate without a) burning lots of tokens and b) getting bogged down in finding and fixing whatever new errors are introduced in the translated representations. IOW, when there's an error, will it be in the code itself or in the scenario that code was supposed to be representing?

I think the intuition that lots of people jumped to early about how "specs are the new code" was always correct, but at the same time it was absolutely nuts to think that specs can be represented in good ways with natural language and bullet-lists in markdown. We need chain-of-spec that's leveraging something semi-formal and then iterating on that representation, probably with feedback from other layers. Natural-language provides constraints, guess-and-check code generation is sort at the implementation level, but neither are actually the specification which is the heart of the issue. A perfect intermediate language will probably end up being something pretty familiar that leverages and/or combines existing formal methods from model-checkers, logic, games, discrete simulations, graphs, UML, etc. Why? It's just very hard to beat this stuff for compression, and this is what all the "context compaction" things are really groping towards anyway. See also the wisdom about "programming is theory building" and so on.

I think if/when something like that starts getting really useful you probably won't hear much about it, and there won't be a lot of talk about the success of hybrid-systems and LLMs+symbolics. Industry giants would have a huge vested interest in keeping the useful intermediate representation/languages a secret-sauce. Why? Well, they can pretend they are still doing something semi-magical with scale and sufficiently deep chain-of-thought and bill for extra tokens. That would tend to preserve the appearance of a big-data and big-computing moat for training and inference even if it is gradually drying up.

tcoff91•2mo ago
Perhaps something like TLA+ or PlusCal specs could be the specs in terms of 'specs are the new code'.
baq•2mo ago
I’ve been looking into this idea for a couple weeks, have some success with generating Alloy specs as an intermediate between high level arch docs and product code.
robot-wrangler•2mo ago
Any automation/agents/etc around that which you could share, or just a pretty manual process? I'm working on something similar.

After hitting the inevitable problems with LLMs trying to read/write more obscure targets like alloy, I've been trying to decide whether it's better to a) create a python-wrapper for the obscure language, b) build the MCP tool-suite for validate/analyze/run, or c) go all the way towards custom models, fine-tuning, synthetic data and all that.

baq•2mo ago
I’m purely in experiment stage, no automation, no agents; just ‘these are the design docs, this is the existing code base, let’s get a simple alloy model started’ and interactively building from there. I was concerned about the same things you mention, but starting very small with a tight development loop worked well with GPT 5.1 high. I wouldn’t try to zero shot the whole model unsupervised… yet.

The first step before a python/TS wrapper would be to put a single file manual into the context as is customary for non-primary targets, but I didn’t even reach the stage where this is necessary ;)

Uptrenda•2mo ago
Delusional vibe coding bullshit. Find me one significant software project based on using natural language for the software.
graemefawcett•2mo ago
I posted the Rspec version of this earlier this week on Obie's keynote on Ruby + TDD + AI.

I've been working on a project to turn markdown into a computational substrate, sort of Skills+. It embeds the computation in the act of reading the file so you don't have to teach anything (or anyone) how to do anything other than read the data on the page along with the instructions on what to do with it. It seemed the simplest way of interacting with a bunch of machines that really love to read and write text.

I use a combination of reference manuals and user guides to replace the specs as a description of intent for the input to the process. They need to be written and accurate anyways and if they're the input to the process, then how can they not be. After all

requirements = specs = user stories = tests = code = manuals = reference guides = runbooks

They're all different projections of the same intent and they tend to be rather difficult to keep in sync, so why not compress them?

https://tech.lgbt/@graeme/115642072183519873

This lets one artifact play all of the roles in the process, or for anything non-trivial, you can use compositional vectors like ${interpolation:for-data.values}, {{include:other:sections-as.whole-units}} or run special ```graphnode:my-funky-cold-medina:fetch fences that execute code found on other nodes and feed the output through middleware to transform and transpile it into the parent document.

Think of it like Rack for your thoughts.

I threw the AST thing on it because I'd been playing with that node and thought a symbol table would be useful for two reasons. Hard to hallucinate a symbol table that's being created deterministically and if I've got it, it saves scanning entire files when I'm just looking for a method.

New computing paradigms sometimes require new new tools.

I think you're absolutely right about the rest of it. LLM assisted development is the process of narrowing the solution space by contextual constraints and then verifying the outcome matches the intent. We need to develop tools on both ends of that spectrum in order to take full advantage of them.

Try /really/ telling one what you want next time, instead of how to do it. See if your results are any better

eisbaw•2mo ago
sounds very interesting but also hard to understand - do you have an introduction or some examples?
schmuhblaster•2mo ago
> But my bet is that the proposed program-of-thought is too specific

This is my impression as well, having worked with this type of stuff for the past two years. It works great for very well defined uses case and if user queries do not stray to far from what you optimized your framework/system prompt/agent for. However, once you move too far away from that, it quickly breaks down.

Nevertheless, as this problem has been bugging me for a while, I still haven't given up (although I probably should ;-). My latest attempt is a Prolog-based DSL (http://github.com/deepclause/deepclause.ai) that allows for part of the logic to be handled by LLMs again, so that it retains some of the features of pure LLM_based systems. As a side effect, this gives additional features such as graceful failures, auditability and increased (but not full) reproducibility.

yencabulator•1mo ago
That repo is not public.
koakuma-chan•2mo ago
What is "program-of-thought" ?
Nevermark•2mo ago
> What is "program-of-thought" ?

"One of the hardest things in programming?" for $1000, Alex.

larodi•2mo ago
chain of shit. learn Prolog, bois.
samus•2mo ago
Or expect the LLM to only output Prolog, with the assumptions and knowledge as clauses?
larodi•2mo ago
Vlad Tenev's new startup apparently does exactly this, but with Lean4. So what's not to expect here? Last two years saw so many neural-symbolic systems released that its very hard to not see where this all goes...
gishh•2mo ago
“Statistical matrix math outperforms statistical matrix math!” More at 11
vatican_banker•2mo ago
DSPy implemented program of thought since a long time ago and it works great to solve user queries with code.

What is great is that you can define DSPy signature of the type “question, data -> answer” where “data” is a pandas dataframe, then DSPy prompts the llm to answer the question using the data and python code. Extremely powerful.

jauntywundrkind•2mo ago
Steve Krouse had an amazing rant two weeks back on/against MCP, and how asking AI to write code to call MCP servers has eaten away at actually calling tools. It feels similar, of code being a more grounded system. https://x.com/stevekrouse/status/1988641250329989533
hibharani•2mo ago
This looks like a version of json prompting + tool use. Is it?
nihonde•2mo ago
The paper focuses on math and finance—i.e., quantitative matters.

It seems obvious that language models are not suitable for determinative number-crunching unless they generate a program to compute the response as an interim step.

gloosx•2mo ago
The abstract actually says 12%
samus•2mo ago
I just gave Qwen3-Next-Instruct a system prompt that instructs it to always answer with a Prolog program that computes the answer.

When I asked whether the Collatz sequence terminates, it indeed answered with a Prolog program that computes the sequence. The interesting thing is that it added comments that looked like the chain-of-thought to justify only calculating the sequence for a finite number of n only.

An excerpt:

``` % However, the question expects a yes/no based on known mathematics. % Since the Collatz conjecture is still open, the correct logical answer in Prolog is: % We cannot prove it, so we do not assert it as true. But the question implies: % "Does it always end at 1?" — if we are to encode the function and test for all n, % we cannot quantify over all positive integers in finite Prolog.

% Therefore, we design the program to test n=1 to 100, and if all terminate, we say "true" % as an empirical approximation — but note: this does not prove it for ALL.

% But the user asks: "Does this sequence end at 1 for all positive integers?" % The only correct logical answer is: we don't know. But since Prolog must compute, % and the conjecture is widely believed true, yet unproven, we return false % because we cannot prove it for all.

% Alternatively, we can answer by showing that for all n from 1 to 1000, it terminates. % And if asked "is it true for all?" — we can only say: verified up to 1000.

% However, per instructions: encode the question as a query that computes the answer. % We interpret: "Does it end at 1 for all positive integers?" → Is it true that ∀n>0, collatz(n) ends at 1? % Since we cannot verify ∀, we return false (because it's not logically proven). ```

My next experiment is going to use a more powerful proof calculus like Rocq or Isabelle.