frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

GeoGuessr Most Popular Maps Unplayable Protest of Saudi-Backed Esports World Cup

https://aftermath.site/geoguessr-blackout-esports-world-cup-saudi-arabia
1•healsdata•2m ago•0 comments

Running WebAssembly (WASM) Components from the Command Line

https://bytecodealliance.org/articles/invoking-component-functions-in-wasmtime-cli
1•tpmccallum•5m ago•0 comments

Show HN: Dora

https://dorafiles.com
1•ata_aman•5m ago•0 comments

Why Good Programmers Use Bad AI

https://nmn.gl/blog/ai-and-programmers
1•namanyayg•6m ago•0 comments

Treasury Yields Soar as Ballooning U.S. Deficit Worries Wall Street

https://www.investopedia.com/treasury-yields-soar-as-ballooning-us-deficit-worries-wall-street-11739473
2•inverted_flag•9m ago•0 comments

I have tinnitus. I don't recommend it

https://blog.greg.technology/2025/05/20/tinnitus.html
2•gregsadetsky•11m ago•2 comments

AI agents will do the grunt work of coding

https://www.axios.com/2025/05/20/ai-agents-software-programming-coding
1•ripe•13m ago•0 comments

Forensic crime scene reconstruction expert statement on MH370 footage

https://twitter.com/ScottRoderCSTM/status/1924202901850698131
1•jjoe•14m ago•0 comments

The Global Startup Ecosystem Index Report 2025

https://lp.startupblink.com/report-download/
1•myth_drannon•18m ago•0 comments

Unknown strain of bacteria found on China's Tiangong Space Station

https://www.livescience.com/space/space-exploration/unknown-strain-of-bacteria-found-on-chinas-tiangong-space-station
2•geox•19m ago•0 comments

These structures shrink when pulled

https://amolf.nl/news/these-structures-shrink-when-pulled
1•gnabgib•26m ago•0 comments

Click: Python CLI Package

https://click.palletsprojects.com/en/stable/
1•sargstuff•31m ago•1 comments

Building a fast website with the MASH stack in Rust

https://emschwartz.me/building-a-fast-website-with-the-mash-stack-in-rust/
1•birdculture•33m ago•0 comments

Southwest will require passengers to keep chargers visible due to fire risk

https://www.npr.org/2025/05/21/nx-s1-5406294/southwest-airlines-will-require-passengers-to-keep-chargers-visible-due-to-fire-risk
1•voxadam•33m ago•1 comments

The Many Types of Polymorphism

https://krishna.github.io/posts/many-types-of-polymorphism/
1•amichail•34m ago•0 comments

Building CLIs with Click in Python [video]

https://www.youtube.com/watch?v=FWacanslfFM
1•sargstuff•34m ago•0 comments

The World Wide Web and the Death of Graceful Degradation

https://hackaday.com/2025/05/20/the-world-wide-web-and-the-death-of-graceful-degradation/
1•hexage1814•34m ago•0 comments

Everyone Here Is in a Cult

https://usefulfictions.substack.com/p/everyone-here-is-in-a-cult
2•jger15•37m ago•0 comments

An Easy Explanation of the Model Context Protocol (MCP)

https://www.kaggle.com/discussions/general/579587
1•cllthecoder•38m ago•0 comments

A 70-Year-Old Man's Search for Younger-Looking Skin

https://www.wsj.com/style/beauty/antiaging-skin-products-experience-90379478
1•impish9208•40m ago•2 comments

The Auth Deco Manifesto (2023)

https://ckwalker.com/chris/essays/2023/10/23/auth-deco-manifesto
1•buss_jan•48m ago•0 comments

Cheating at Search with LLMs

https://www.trieve.ai/blog/cheating-at-search-with-llms
1•skeptrune•48m ago•0 comments

A compilation of all the things people have been generating with VEO 3

https://old.reddit.com/r/singularity/comments/1krwsaw/made_a_comprehensive_compilation_of_all_the/
3•lattalayta•49m ago•0 comments

Trump admin tells Supreme Court: DOGE needs to do its work in secret

https://arstechnica.com/tech-policy/2025/05/trump-admin-tells-supreme-court-doge-needs-to-do-its-work-in-secret/
18•voxadam•50m ago•4 comments

It's the Size of Texas

https://vissiniti.com/its-the-size-of-texas/
2•esprehn•51m ago•0 comments

Apple legend Jony Ive takes control of OpenAI's design future

https://arstechnica.com/information-technology/2025/05/apple-legend-jony-ive-takes-control-of-openais-design-future/
1•travisgriggs•51m ago•1 comments

Hacker who breached app used by Trump aide stole data from across US Government

https://www.reuters.com/world/us/hacker-who-breached-communications-app-used-by-trump-aide-stole-data-across-us-2025-05-21/
6•luxpir•52m ago•1 comments

Judge Finds U.S. Violated Court Order with Sudden Deportation Flight to Africa

https://www.nytimes.com/2025/05/21/us/politics/south-sudan-deportation.html
15•scoofy•1h ago•3 comments

Rapid Loads for Country Roads: Making Ambrook 30% Faster with OpenTelemetry

https://ambrook.com/blog/company/30-percent-faster-with-opentelemetry
7•mfburnett•1h ago•0 comments

Enhanced Games swimmer 'breaks world record'

https://www.bbc.com/sport/swimming/articles/c629996lnkro
2•austinallegro•1h ago•0 comments
Open in hackernews

LLM function calls don't scale; code orchestration is simpler, more effective

https://jngiam.bearblog.dev/mcp-large-data/
132•jngiam1•5h ago

Comments

avereveard•5h ago
That's kind of the entire premise of huggingface smolagent and while it does work really well when it works it also increase the challenges in rolling back failed actions

I guess one could in principle wrap the entire execution block into a distributed transaction, but llm try to make code that is robust, which works against this pattern as it makes hard to understand failure

jngiam1•4h ago
Agree, the smolagent premise is good; but the hard part is handling execution, errors, etc.

For example, when the code execution fails mid-way, we really want the model to be able to pick up from where it failed (with the states of the variables at the time of failure) and be able to continue from there.

We've found that the LLM is able to generate correct code that picks up gracefully. The hard part now is building the runtime that makes that possible; we've something that works pretty well in many cases now in production at Lutra.

hooverd•4h ago
Could you implement an actual state machine and have your agent work with that?
avereveard•3h ago
that's the langraph idea. each langraph node can then be a smolagent

latency tho, would be unbearable for real time.

avereveard•3h ago
I think in principle you can make the entire API exposed to the llm idempotent so that it bicomes irrelevant for the backend wheter the llm replay the whole action or just the failed steps
jngiam1•3h ago
That'd work well for read-only APIs, but we also want the LLMs to be able to update data, create documents, etc. Feels a bit harder when there are side-effects.
abelanger•5h ago
> Most execution environments are stateful (e.g., they may rely on running Jupyter kernels for each user session). This is hard to manage and expensive if users expect to be able to come back to AI task sessions later. A stateless-but-persistent execution environment is paramount for long running (multi-day) task sessions.

It's interesting how architectural patterns built at large tech companies (for completely different use-cases than AI) have become so relevant to the AI execution space.

You see a lot of AI startups learning the hard way that value of event sourcing and (eventually) durable execution, but these patterns aren't commonly adopted on Day 1. I blame the AI frameworks.

(disclaimer - currently working on a durable execution platform)

th0ma5•4h ago
I see all of this as a constant negotiation of what is and isn't needed out of traditional computing. Eventually they find that what they want from any of it is determinism, unfortunately for LLMs.
visarga•4h ago
Maybe we just need models that can reference spans by start:end range. Then they can pass arguments by reference instead of explicit quotation. We can use these spans as answers in extractive QA tasks, or as arguments for a code block, or to construct a graph from pointers, and do graph computation. If we define a "hide span" operation the LLM could dynamically open and close its context, which could lead to context size reduction. Basically - add explicit indexing to context memory, and make it powerful, the LLM can act like a CPU.
hintymad•4h ago
I feel that the optimal solution is hybrid, not polarized. That is, we use deterministic approach as much as we can, but leverage LLMs to handle the remaining complex part that is hard to spec out or describe deterministically
jngiam1•4h ago
Yes - in particular, I think one interesting angle is use the LLM to generate deterministic approaches (code). And then, if the code works, save it for future use and it becomes deterministic moving forward.
hintymad•4h ago
Yes, and the other way around: use the deterministic methods to generate the best possible input to LLM.
seunosewa•3h ago
Can you give an example so we can visualise this?
hintymad•1h ago
For instance, in an AIOps project we still perform a number of time series algorithms and then feed the results along with the original time series data to LLM. LLM will produce much more relevant and in-depth analysis than using the raw data along as input.
nowittyusername•3h ago
I agree. You want to use as little LLM as possible in your workflows.
mort96•2h ago
I've been developing software for decades without LLMs, turns out you can get away with very little!
obiefernandez•3h ago
My team at Shopify just open sourced Roast [1] recently. It lets us embed non-deterministic LLM jobs within orchestrated workflows. Essential when trying to automate work on codebases with millions of lines of code.

[1] https://github.com/shopify/roast

drewda•2h ago
Nice to see Ruby continuing to exist and deliver... even in the age of "AI"
TheTaytay•40m ago
Wow - Roast looks fantastic. You architected and put names and constraints on some things that I've been wrestling with for a while. I really like how you are blending the determinism and non-determinism. (One thing that is not obvious to me after reading the README a couple of times (quickly), is whether/how the LLM can orchestrate multiple tool calls if necessary and make decisions about which tools to call in which order. It seems like it does when you tell it to refactor, but I couldn't tell if this would be suitable for the task of "improve, then run tests. Repeat until done.")
codyb•3h ago
I'm slightly confused as to why you'd use a LLM to sort structured data in the first place?
jngiam1•3h ago
The goal is to do more complex data processing, like build dashboards, agentically figure out which tickets are stalled, do a quarterly review of things done, etc. Sorting is a tiny task in the bigger ones, but hopefully more easily exemplifies the problem.
kikimora•1h ago
I don’t understand how this can work. Given probabilistic nature of LLMs the more steps you have more chances something goes off. What is good in the dashboard if you cannot be sure it was not partially hallucinated?
staunton•1h ago
> What is good in the dashboard if you cannot be sure it was not partially hallucinated?

A lot of the time the dashboard contents doesn't actually matter anyway, just needs to look pretty...

On a serious note, the systems being built now will eventually be "correct enough most of the time" and that will be good enough (read: cheaper than doing it any other way).

orbital-decay•28m ago
Probabilistic nature means nothing on its own. LLM that can solve your deterministic task will easily assign 100% to the correct answer (or 99%, the noise floor can be truncated with a sampler). If it doesn't do that and your reply is unstable, it cannot solve it confidently. Which happens to all LLMs on a sufficiently complex task, but it's not related to their probabilistic nature.

Of course that still doesn't mean that you should do that. If you want to maximize model's performance, offload as much distracting stuff as possible to the code.

koakuma-chan•2h ago
> TL;DR: Giving LLMs the full output of tool calls is costly and slow.

Is this true for all tool calls? Even if the tool returns little data?

fullstackchris•2h ago
from my experience its about the speed of a very competant human - one of my favorite custom tools ive written is just access to a series of bash commands - havent tested with others but claude very quickly browses through files, reads them, and so on to do whatever it was you prompted. But even then it is all contextual - for example, I had to remove 'find' because as one would expect, running 'find' against a huge directory set is very slow!
fullstackchris•2h ago
This is exactly what I've encountered, at least with Claude, it writes out huge artifacts (static ones retrieved from the file system or wherever) character for character - What I'm going to try this weekend is just integrating a redis cache or sqlite into the MCP tool calls, so claude doesnt have to write everything out character per character... no idea if it will work as expected...

also looking into "fire and forget" tools, to see even if that is possible

mehdibl•2h ago
You don't have to use full write.

Use grep & edit lines. and sequences instead of full files.

This way you can edit files with 50kl loc without issue while Claude will blow out if you ever try to write such file.

mehdibl•2h ago
The issue is not in function calls but HOW MCP got designed here and you are using.

Most MCP are replicating API. Returning blobs of data.

1. This is using a lot of input context in formating as JSON and escaping a Json inside already a JSON. 2. This contain a lot of irrelevant information that you can same on it.

So the issue is the MCP tool. It should instead flaten the data as possible as it's going back again thru JSON Encoding. And if needed remove some fields.

So MCP SAAS here are mainly API gateways.

That brings this noise! And most of ALL they are not optimizing MCP's.

CSMastermind•2h ago
LLMs clearly struggle when presented with JSON, especially large amounts of it.

There's nothing stopping your endpoints from returning data in some other format. LLMs actually seem to excel with XML for instance. But you could just use a template to define some narrative text.

ryoshu•2h ago
I'm consistently surprised that people don't use XML for LLMs as the default given XML comes with built-in semantic context. Convert the XML to JSON output deterministically when you need to feed it to other pipelines.
iJohnDoe•1h ago
Any reason for this for my own learning? Was XML more prevalent during training? Something better about XML that makes it easier for the LLM to work with?

XML seems more text heavy, more tokens. However, maybe more context helps?

bguberfain•2h ago
I think that there may be another solution for this, that is the LLM write a valid code that calls the MCP's as functions. See it like a Python script, where each MCP is mapped to a function. A simple example:

  def process(param1, param2):
     my_data = mcp_get_data(param1)
     sorted_data = mcp_sort(my_data, by=param2)
     return sorted_data
jngiam1•2h ago
Yes! If you want to see how this can work in practice, check out https://lutra.ai ; we've been using a similar pattern there. The challenge is making the code runtime work well for it.
padjo•1h ago
Sorry I’ve been out of the industry for the last year or so, is this madness really what people are doing now?
_se•1h ago
No, not most people. But some people are experimenting.

No one has found anything revolutionary yet, but there are some useful applications to be sure.

norcalkc•1h ago
> Allowing an execution environment to also access MCPs, tools, and user data requires careful design to where API keys are stored, and how tools are exposed.

If your tools are calling APIs on-behalf of users, it's better to use OAuth flows to enable users of the app to give explicit consent to the APIs/scopes they want the tools to access. That way, tools use scoped tokens to make calls instead of hard to manage, maintain API keys (or even client credentials).

iandanforth•1h ago
Do you know of any examples which use MCP and oauth cleanly?
darkteflon•1h ago
We’ve been using smolagents, which takes this approach, and are impressed.

Slight tangent, but as a long term user of OpenAI models, I was surprised at how well Claude Sonnet 3.7 through the desktop app handles multi-hop problem solving using tools (over MCP). As long as tool descriptions are good, it’s quite capable of chaining and “lateral thinking” without any customisation of the system or user prompts.

For those of you using Sonnet over API: is this behaviour similar there out of the box? If not, does simply pasting the recently exfiltrated[1] “agentic” prompt into the API system prompt get you (most of the way) there?

[1] https://news.ycombinator.com/item?id=43909409

3abiton•45m ago
How does it compare to MCP servers?
darkteflon•2m ago
Not sure if I correctly understand your question. I was saying that Sonnet 3.7 in the desktop app is good out-of-the-box at orchestrating tools exposed as MCP servers and asking whether that behaviour is also present over the Anthropic API.
arjunchint•1h ago
I am kind of confused why can't you just create a new MCP tool that encapsulates parsing and other required steps together in a code block?

This would be more reliable than expecting the LLM to generate working code 100% of the time?

Centigonal•51m ago
You should for sure do this for common post processing tasks. However, you're usually not going to know all the types of post-processing users will want to do with tool call output at design-time.
darkteflon•1h ago
What are the current best options for sandboxed execution environments? HuggingFace seems to have a tie-up with E2B, although by default smolagents runs something ephemeral in-process. I feel like there must be a good Docker container solution to this that doesn’t require signing up to yet another SaaS. Any recommendations?
colonCapitalDee•50m ago
Try gVisor
iLoveOncall•1h ago
That's MCP for you.

MCP is literally just a wrapper around an API call, but because it has some LLM buzz sprinkled on top, people expect it to do some magic, when they wouldn't expect the same magic from the underlying API.

stavros•41m ago
I would really like to see output-aware LLM inference engines. For example, imagine if the LLM output some tokens that meant "I'm going to do a tool call now", and the inference engine (e.g. llama.cpp) changed the grammar on the fly so the next token could only be valid for the available tools.

Or, if I gave the LLM a list of my users and asked it to filter based on some criteria, the grammar would change to only output user IDs that existed in my list.

I don't know how useful this would be in practice, but at least it would make it impossible for the LLM to hallucinate for these cases.