frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Dexterous robotic hands: 2009 – 2014 – 2025

https://old.reddit.com/r/robotics/comments/1qp7z15/dexterous_robotic_hands_2009_2014_2025/
1•gmays•2m ago•0 comments

Interop 2025: A Year of Convergence

https://webkit.org/blog/17808/interop-2025-review/
1•ksec•11m ago•1 comments

JobArena – Human Intuition vs. Artificial Intelligence

https://www.jobarena.ai/
1•84634E1A607A•15m ago•0 comments

Concept Artists Say Generative AI References Only Make Their Jobs Harder

https://thisweekinvideogames.com/feature/concept-artists-in-games-say-generative-ai-references-on...
1•KittenInABox•19m ago•0 comments

Show HN: PaySentry – Open-source control plane for AI agent payments

https://github.com/mkmkkkkk/paysentry
1•mkyang•20m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
1•ShinyaKoyano•30m ago•0 comments

The Crumbling Workflow Moat: Aggregation Theory's Final Chapter

https://twitter.com/nicbstme/status/2019149771706102022
1•SubiculumCode•34m ago•0 comments

Pax Historia – User and AI powered gaming platform

https://www.ycombinator.com/launches/PMu-pax-historia-user-ai-powered-gaming-platform
2•Osiris30•35m ago•0 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
1•ambitious_potat•41m ago•0 comments

Scams, Fraud, and Fake Apps: How to Protect Your Money in a Mobile-First Economy

https://blog.afrowallet.co/en_GB/tiers-app/scams-fraud-and-fake-apps-in-africa
1•jonatask•41m ago•0 comments

Porting Doom to My WebAssembly VM

https://irreducible.io/blog/porting-doom-to-wasm/
1•irreducible•42m ago•0 comments

Cognitive Style and Visual Attention in Multimodal Museum Exhibitions

https://www.mdpi.com/2075-5309/15/16/2968
1•rbanffy•43m ago•0 comments

Full-Blown Cross-Assembler in a Bash Script

https://hackaday.com/2026/02/06/full-blown-cross-assembler-in-a-bash-script/
1•grajmanu•48m ago•0 comments

Logic Puzzles: Why the Liar Is the Helpful One

https://blog.szczepan.org/blog/knights-and-knaves/
1•wasabi991011•1h ago•0 comments

Optical Combs Help Radio Telescopes Work Together

https://hackaday.com/2026/02/03/optical-combs-help-radio-telescopes-work-together/
2•toomuchtodo•1h ago•1 comments

Show HN: Myanon – fast, deterministic MySQL dump anonymizer

https://github.com/ppomes/myanon
1•pierrepomes•1h ago•0 comments

The Tao of Programming

http://www.canonical.org/~kragen/tao-of-programming.html
2•alexjplant•1h ago•0 comments

Forcing Rust: How Big Tech Lobbied the Government into a Language Mandate

https://medium.com/@ognian.milanov/forcing-rust-how-big-tech-lobbied-the-government-into-a-langua...
3•akagusu•1h ago•0 comments

PanelBench: We evaluated Cursor's Visual Editor on 89 test cases. 43 fail

https://www.tryinspector.com/blog/code-first-design-tools
2•quentinrl•1h ago•2 comments

Can You Draw Every Flag in PowerPoint? (Part 2) [video]

https://www.youtube.com/watch?v=BztF7MODsKI
1•fgclue•1h ago•0 comments

Show HN: MCP-baepsae – MCP server for iOS Simulator automation

https://github.com/oozoofrog/mcp-baepsae
1•oozoofrog•1h ago•0 comments

Make Trust Irrelevant: A Gamer's Take on Agentic AI Safety

https://github.com/Deso-PK/make-trust-irrelevant
7•DesoPK•1h ago•4 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
1•rs545837•1h ago•1 comments

Hello world does not compile

https://github.com/anthropics/claudes-c-compiler/issues/1
35•mfiguiere•1h ago•20 comments

Show HN: ZigZag – A Bubble Tea-Inspired TUI Framework for Zig

https://github.com/meszmate/zigzag
3•meszmate•1h ago•0 comments

Metaphor+Metonymy: "To love that well which thou must leave ere long"(Sonnet73)

https://www.huckgutman.com/blog-1/shakespeare-sonnet-73
1•gsf_emergency_6•1h ago•0 comments

Show HN: Django N+1 Queries Checker

https://github.com/richardhapb/django-check
1•richardhapb•1h ago•1 comments

Emacs-tramp-RPC: High-performance TRAMP back end using JSON-RPC instead of shell

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•todsacerdoti•1h ago•0 comments

Protocol Validation with Affine MPST in Rust

https://hibanaworks.dev
1•o8vm•2h ago•1 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
5•gmays•2h ago•1 comments
Open in hackernews

Effective context engineering for AI agents

https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
148•epenson•4mo ago

Comments

CuriouslyC•4mo ago
The article doesn't really give helpful advice here, but please don't vibe this.

Create evals from previous issues and current tests. Use DSPy on prompts. Create hypotheses for the value of different context packs, and run an eval matrix to see what actually works and what doesn't. Instrument your agents with Otel and stratify failure cases to understand where your agents are breaking.

typpilol•4mo ago
How hard is dspy to setup?

Isn't it a programming language type thing?

Can you even integrate that into an existing codebase easily?

CuriouslyC•4mo ago
It's pretty straightforward, different optimizers have different requirements. Some require example inputs/outputs, others will just optimize on whatever you've got. You can use codex/claude code to set it up in order to bootstrap quickly, they're decent at it.
koakuma-chan•4mo ago
Does dspy support structured outputs?
CjHuber•4mo ago
Yes using signatures with types
ijk•4mo ago
Yes, I was using it for structured outputs before the dedicated structured outputs got their act together.
wanderingmind•4mo ago
Otel meaning open Telemetry? Do they have special capability for tracking agents?
CuriouslyC•4mo ago
Yes, there is an otel standard for agent traces. You can instrument agents that don't natively support Otel via bifrost.
ivape•4mo ago
I think any meaningful context engineering strategies will be trade secrets.
lomase•4mo ago
Imagine where we would be if academia or open source had this train of tougth.

No algorithms, no Linux, no open protocols, maybe not even internet.

ivape•4mo ago
Sure, it’s a horrible attitude. With that said, there is a time and place for everything. At the very beginning of AI, which is where we are, it’s not necessarily evil to carve out your advantages and share later.
saltyoldman•4mo ago
Maybe, but we'll be getting to a place where each LLM call gets cheaper, faster and has a larger context, it may not matter long term.
SOLAR_FIELDS•4mo ago
Context is often not the only issue. Really the issue is attention - context is a factor in how well the LLM handles attention to the broad scope of a task, but one can anecdotally easily observe the thing forget or go off the rails when only a fraction of the context window is being used. Oftentimes it’s effective to just say “don’t ever go above 20% of the max”
ijk•4mo ago
Some of that is, or at least was, down to the training: extending the context window but not training on sufficiently long data or using weak evaluation metrics caused issues. More recent models have been getting better, though long context performance is still not as good as short context performance, even if the definition of "short context" has been greatly extended.

RoPE is great and all, but doesn't magically give 100% performance over the lengthened context; that takes more work.

layer8•4mo ago
Why do you think that?
ivape•4mo ago
Competitive edge. Some agents will be better than others, therefore worth paying for. So for example, if one writes an AI trading agent, there’s no reason to share it similar to how it is at the moment with regular trading algos.

I’m not saying it won’t eventually be known, but not in these initial stages.

The only thing separating Claude, Gemini and ChatGPT is their context and prompt engineering, assuming the frontier models belong to the same class of capability. You can absolutely release a competitor to these things that could perform better for certain things (or even all things, if you introduce brand new context engineering ideas), if you wanted to.

layer8•4mo ago
No, I mean why do you think that effective context engineering will remain a black art, instead of becoming something with standard practices that work well for most use cases?
ivape•4mo ago
I can’t say it will remain a black art because the tech itself creates new paradigms constantly. An LLM can be fine tuned with context engineering examples, similar to Chain Of Thought tuning, and that’s how we get a reasoning loop. With enough fine tuning, we could get a similar context loop, in which case those keeping things hidden will be washed away with new paradigms.

Even if someone fine tuned an LLM with this type of data, Deepseek has shown that they can just use a teacher-student strategy to steal from whatever model you trained (exfiltrate your value-add, which is how they stole from OpenAI). Stealing is already a thing in this space, so don’t be shocked if over time you see a lot more protectionism (protectionism is something we already see geopolitically on the hardware front).

I don’t know what’s going to happen, but I can confidently say that if humans are involved at this stage, there will absolutely be some level of information siloing, and stealing.

——

But to directly answer your question:

”… instead of becoming something with standard practices that work well for most use cases?”

In no uncertain terms, the answer is because of money.

mupuff1234•4mo ago
Idk if trade secrets really exist in a world where engineers at every level hop between the same x companies every other Monday.
SOLAR_FIELDS•4mo ago
These companies all wax on about how important context engineering is yet not one of them has released acceptable tooling for end users to visualize and understand the context window as it grows and shrinks during a session. Best Claude code can do? Warn you when you hit 80% full
krystofee•4mo ago
try /context in Claude Code
grim_io•4mo ago
A very crude tool. A good start maybe, but it does not give us any information about the message part of the context, the one that matters.

We can't really do much with the information that x amount is reserved for MCP, tool calling or the system prompt.

simonbw•4mo ago
> We can't really do much with the information that x amount is reserved for MCP, tool calling or the system prompt.

I actually think this is pretty useful information. It helps you evaluate whether an MCP server is worth the context cost. Similar for getting a feel for how much context certain tool uses use up. I feel like there's a way you can change the system prompt, and so that helps you evaluate if what you've got there is worth it also.

grim_io•4mo ago
Sure, it's useful, once.

What we need is a way to manage the dynamic part of the context without just starting from zero each time.

SOLAR_FIELDS•4mo ago
My theory is that you will never get this from a frontier model provider because as is alluded to in sibling thread the context window management is actually a good hunk of the secret sauce that makes these things effective and companies do not want to give that up
joshribakoff•4mo ago
Cursor has a circular progress bar for context usage.
sublimefire•4mo ago
It’s kind of useful but I suppose they just admit that failure rate increases with large context windows. My guess is that what happened to the presentation of those Meta glasses where the model would not do what was asked for.

Another interesting thought might be that long horizon tasks need different tooling, and with the shift to long running tasks you can use cheaper models as well. None of the big providers have good tools for that at the moment, so the only thing they can say is: to fix our contexts but still use their models.

llm-cool-j•4mo ago
I find you can give it a task and the full context in your 1st message, and also include (a) asking what files are needed to understand and complete task, and (b) ask if there’s anything ambiguous about the task/question. Then, when you get the response, create a new chat with just the files it recommends, and the ambiguities explained in the 1st comment. Sometimes you need a couple of rounds of this.

The you will have a good starting point, with less chance of running out of space before solving the task.

If you can’t give it full context at the beginning, you can give it a tree listing of the files involved, and maybe a couple of READMEs (if there are any) and ask it see if it can work out what files are needed, giving it a couple of files at a time, at its suggestion.

lyu07282•4mo ago
I think "output engineering" is equally as important, and steering with grammar (structured output with json schema or CFGs directly) is a huge win there I find:

https://platform.openai.com/docs/guides/function-calling#con...

CuriouslyC•4mo ago
Oh yeah, this is huge! I instruct agents to do a few things in this vein that are big improvements:

1. Have agents emit chatter in a structured format. Have them emit hypotheses, evidence, counterfactuals, invariants, etc. The fully natural language agent chatter is shit for observability, and if you have structured agent output you can actually run script hooks that are very powerful in response to agent input/output.

2. Have agents summarize key evidence from toolcalls, then just drop the tool call output from context (you can give them a tool to retrieve the old value without recomputation, cache tool output in redis and give them a key to retrieve it later if needed). Tool calls dominate context expansion bloat, and once you've extracted evidence the original tool output is very low value.

elpakal•4mo ago
I’ve been playing around with Apple’s Foundation Models, their on device llm has a 4k context window. That’s really been an interesting exercise in context engineering coming from others like Claude and GPT. I think those larger context windows have made me take context engineering for granted.
kcartlidge•3mo ago
Why are we hearing that "studies" have "uncovered the concept of context rot as the number of tokens in the context window increases"? It's obvious, and we've always known this.

Agents are stateless, hence the need for context. This means that all they know about the ongoing session is what's in that context (generally speaking). As the context grows any particular element within it becomes a smaller and smaller percentage of the whole. The LLM is not 'losing focus'; it's being diluted with more tokens. But then I suppose anthropomorphism comes naturally to a company named Anthropic, and 'losing focus' does make it sound more human.

They didn't need a study and article, but it likely contributes towards the mystique. Hence the use of phrases like "this results in n² pairwise relationships for n tokens" to make it sound more erudite and revelatory.