frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

How FIDO2 works, a technical deep dive

https://michaelwaterman.nl/2025/04/02/how-fido2-works-a-technical-deep-dive/
1•xeonmc•1m ago•0 comments

Claude Code's System Prompt

https://gist.github.com/kylecarbs/21f9f5cd643f4f5d2a05f97cdcd34bde
3•kylecarbs•9m ago•0 comments

Retailer Temu's daily US users halve following end of 'de minimis' loophole

https://www.reuters.com/business/retail-consumer/retailer-temus-daily-us-users-halve-following-end-de-minimis-loophole-2025-06-02/
2•TMWNN•11m ago•1 comments

What Is Quishing? How Hackers Use QR Codes to Steal Your Data

https://www.youtube.com/watch?v=RVF6NVnJvd8
2•Brysonbw•14m ago•0 comments

Your Manager Is Not Your Best Friend

https://staysaasy.com/management/2025/06/02/your-manager-is-not-your-best-friend.html
1•thisismytest•17m ago•0 comments

Science-integrity project will root out bad medical papers 'and tell everyone'

https://www.nature.com/articles/d41586-025-01739-z
1•gnabgib•17m ago•0 comments

Release: QuadParts – FPV Drone Inventory SelfHosted Application

https://github.com/hasmeni/QuadParts
1•alklinewebb•20m ago•0 comments

We turned public transit into a multiplayer game

https://blog.transitapp.com/autogo/
1•fluxic•23m ago•0 comments

Running FreeDOS inside a Pokémon Emerald save file [video]

2•thepbone•24m ago•2 comments

Anthropic decided to cut off all of windsurf capacity to all Claude 3.x models

https://twitter.com/_mohansolo/status/1930034960385356174
3•xiaom•28m ago•0 comments

Quality and Taste

https://mfelix.org/stories/quality-and-taste/
3•EFFALO•32m ago•0 comments

Ask HN: Any good productivity tools out there?

2•jenever•39m ago•9 comments

Is Strava's "Athlete Intelligence" useful?

https://old.reddit.com/r/Strava/comments/1l0zw94/is_athlete_intelligence_actually_useful/
1•apwell23•41m ago•0 comments

Statement on Anthropic Model Availability

https://windsurf.com/blog/anthropic-models
2•georgehill•41m ago•0 comments

Coding Through Chaos: Addiction, Recovery and Acceptance

https://corecursive.com/coding-through-chaos-with-john-walker/
1•todsacerdoti•43m ago•0 comments

Windsurf says Anthropic is limiting its direct access to Claude AI models

https://techcrunch.com/2025/06/03/windsurf-says-anthropic-is-limiting-its-direct-access-to-claude-ai-models/
5•mikece•46m ago•1 comments

SkyPlanter – Drone-mounted seedling planting system [video]

https://www.youtube.com/watch?v=o_D8JCQ2mX4
1•gnabgib•48m ago•0 comments

I discovered that Bill Gates monopolized ACPI in order to break Linux

https://enaix.github.io/2025/06/03/acpi-conspiracy.html
3•_JamesA_•49m ago•2 comments

Canadian wildfire smoke blankets swath of North America

https://earthsky.org/earth/canadian-wildfire-smoke-north-america-june-2025/
2•geox•1h ago•0 comments

BuildPad – A platform that helps founders go from idea to successful product

https://buildpad.io/
3•peter_d_sherman•1h ago•0 comments

Economics and labor rights in AI skepticism

https://henry.codes/writing/economics-and-labor-rights-in-ai-skepticism/
1•OuterVale•1h ago•0 comments

Meta Signs Nuclear Power Deal to Fuel Its AI Ambitions

https://www.wsj.com/business/energy-oil/meta-signs-nuclear-power-deal-to-fuel-its-ai-ambitions-70c85367
2•bookofjoe•1h ago•1 comments

The HTTP Query Method

https://httpwg.org/http-extensions/draft-ietf-httpbis-safe-method-w-body.html
2•ronbenton•1h ago•0 comments

AI LLMs can't count lines in a file

13•sha-69•1h ago•18 comments

What do software developers need to know to succeed in an age of AI?

https://arxiv.org/abs/2506.00202
2•turadg•1h ago•0 comments

Show HN: AI Email Prioritizer – Auto-Organize Gmail with Nvidia LLM

2•anandvc•1h ago•0 comments

Hacksaw: Hardware-Centric Kernel Debloating (2023) [pdf]

https://www.microsoft.com/en-us/research/wp-content/uploads/2023/07/hacksaw-ccs23.pdf
1•peter_d_sherman•1h ago•0 comments

Mikko Hypponen Leaves Anti-Malware Industry to Fight Against Drones

https://www.securityweek.com/mikko-hypponen-joins-anti-drone-company-sensofusion/
1•r721•1h ago•0 comments

Subtxt/Dramatica

https://subtxt.app
2•turtleyacht•1h ago•1 comments

Show HN: Code Search Mcp for GitHub

https://github.com/edelauna/github-semantic-search-mcp/tree/dev/workflow
1•wwdmaxwell•1h ago•1 comments
Open in hackernews

Show HN: System Prompt Learning – LLMs Learn Problem-Solving from Experience

47•codelion•1d ago
I built a system that lets LLMs automatically learn and improve problem-solving strategies over time, inspired by Andrej Karpathy's idea of a "third paradigm" for LLM learning.

The basic idea: instead of using static system prompts, the LLM builds up a database of strategies that actually work for different problem types. When you give it a new problem, it selects the most relevant strategies, applies them, then evaluates how well they worked and refines them.

For example, after seeing enough word problems, it learned this strategy:

1) Read carefully and identify unknowns,

2) Define variables with units,

3) Write equations,

4) Solve step-by-step,

5) Verify the answer.

All strategies are stored as human-readable JSON that you can inspect and edit.

I tested it on math benchmarks and saw decent improvements - 8.6% better on Arena Hard, 6.67% on AIME24. After 500 queries, the system had created 129 strategies and refined 97 of them.

The implementation is an open-source plugin for optillm (our inference optimization proxy). It works with any OpenAI-compatible API - you just add "spl-" to your model name. Has two modes: inference-only (uses existing strategies) and learning mode (creates and refines strategies).

What's interesting is that it bridges the gap between the sophisticated system prompts that production AI uses and the basic prompts most of us work with. Your model literally gets better at the types of problems you throw at it.

Built it because I noticed ChatGPT, Claude etc. have incredibly detailed system prompts with problem-solving frameworks, but most developers use basic prompts and miss out on those performance gains. The approach is inspired by Andrej Karpathy's tweet about a "third paradigm" for LLM learning beyond just pretraining and fine-tuning: https://x.com/karpathy/status/1921368644069765486

The strategies are completely transparent - you can see exactly what the system learned and why it's making certain decisions. No black box learning.

https://github.com/codelion/optillm/tree/main/optillm/plugin...

Would love feedback on the approach. Has anyone else experimented with LLMs learning from their own experience?

Comments

codelion•1d ago
Thanks for checking this out! A few additional details that didn't fit in the main post:

The system maintains two separate limits: a storage limit (max 10 strategies per problem type in the database) and an inference limit (max 3 strategies applied per query). This keeps the database manageable while ensuring the system prompt doesn't get too long.

One interesting finding was that strategies only get used for inference once they have at least 5 attempts and a 40% success rate. This prevents the system from applying unproven strategies to new problems.

The approach works particularly well with reasoning models like DeepSeek-R1 and QwQ - the learned strategies seem to guide their thinking process effectively.

I'm especially curious about:

1. How this might work with different model families

2. Whether the community sees value in sharing strategy databases between users

3. Ideas for extending beyond text-based reasoning to multimodal problems

The plugin integrates with our broader optillm project which has other inference optimization techniques. You can combine SPL with methods like mixture-of-agents or MCTS using the "&" operator.

Next I'm thinking about meta-learning - having the system learn how to create better strategies more efficiently. Also exploring collaborative strategy sharing.

Would love to hear thoughts on the approach or if anyone has ideas for other problem domains where this might be useful!

ramonga•1d ago
I would like to see some interesting input/output pairs. Do you have any?
codelion•1d ago
We have some examples in the plugin README: https://github.com/codelion/optillm/tree/main/optillm/plugin...

E.g. This was the strategy discovered by optiLLM for solving word problems:

*Refined Strategy for Solving Word Problems:*

1. *Understand:*\n * Read the problem carefully (multiple times).\n * Identify the question (what are you trying to find?).\n * List all given information (facts, numbers, units).\n * Clarify ambiguous terms/units.

2. *Organize Information & Identify Unknowns:*\n * Choose an organization method: (e.g., table, diagram, list, drawing).\n * Clearly identify the unknowns (what you need to solve for).

3. *Plan and Translate:*\n * Define all variables with units (e.g., `p = number of pennies`, `c = number of compartments`).\n * Identify relationships between knowns and unknowns.\n * Convert units if necessary.\n * Write equations or expressions, including units, that relate the knowns and unknowns.\n * Ensure units are consistent throughout the equations.\n * Outline the solution steps.

4. *Solve:*\n * Show work step-by-step.\n * Track units throughout calculations.\n * Calculate accurately.\n * Solve for the unknowns.\

5. *Evaluate and Verify:*\n * Check if the answer is reasonable.\n * Verify the answer.

6. *Summarize:*\n * State the answer with units

Full list of strategies discovered is available here -https://github.com/codelion/optillm/blob/main/optillm/plugin...

tanchaowen84•1d ago
This is a really cool idea! I recently came across another project on GitHub: https://github.com/tensorzero/tensorzero that explores a similar direction. You might find it interesting, and perhaps it could offer some inspiration or useful insights for your work as well.
yunusabd•1d ago
That's an interesting space to explore! I'm wondering about the baseline in the benchmarks. Which prompts did you use for those? I'm asking because some of the resulting prompts seem fairly generic, and I'm wondering if you could just blanket add them to each prompt and also see an improvement. Things like "Identify the question (what are you trying to find?)".

In the same vein, wouldn't it be interesting to measure which part of the prompt most contributed to better solving the problem? Surely some parts will be just noise and can be trimmed away.

Also wondering what this does, since the model probably won't (can't?) actually read the problem multiple times:

  > Read the problem carefully (multiple times).
codelion•1d ago
Re-reading the problem apparently works well - https://arxiv.org/abs/2309.06275

Here the system seems to have discovered this strategy by itself. The prompts are generic because during learning there is a part to refine and combine them. I haven’t experimented yet by adding all prompts to every query, given the large context it will be interesting to see.

yunusabd•1d ago
Okay, but it looks like in the paper, they are actually adding the question twice in the prompt, not just instructing the model to read it twice. Or am I missing something?
Falimonda•1d ago
How do you forsee a system like this efficiently managing and relying on a set of strategies whose size can become unbounded?
codelion•1d ago
We do not allow the strategies to keep growing there is a refinement phase where we refine and merge existing strategies. The experiments were run with this config - https://github.com/codelion/optillm/blob/main/optillm/plugin... which allows a maximum of 10 strategies of each type.
dedicate•1d ago
If I jump in and, say, manually 'tweak' one of those JSON strategies because I think I have a better idea, what happens next? Does the LLM just roll with my brilliant human intervention, or could it eventually 'learn' that my tweak was actually counterproductive and refine it back (or away from my edit)?
codelion•1d ago
You can run in two modes, by default you run in the inference mode without learning. So, the changes you made will be used. If you switch to learning mode then the strategies are updated/refined and merged based on a config that you can control.

# How often to perform maintenance operations (merge, prune)

MAINTENANCE_INTERVAL = 40

# Strategy selection thresholds

STRATEGY_CREATION_THRESHOLD = 0.7 # Higher threshold to avoid creating similar strategies

STRATEGY_MERGING_THRESHOLD = 0.6 # Lower threshold to merge more similar strategies

MIN_SUCCESS_RATE_FOR_INFERENCE = 0.4 # Minimum success rate for a strategy to be used during inference

The configs are all defined here - https://github.com/codelion/optillm/blob/main/optillm/plugin...

imaltont•1d ago
You should take a look at something called Case-based reasoning. Seems to perfectly fit into the road you are currently walking, as you basically just rediscovered the CBR-cycle.