frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Why Linguistic Context Outperforms Raw Data for LLM Decision-Making

https://www.prereason.com/evidence/research
2•KalskiTheDan•1h ago

Comments

KalskiTheDan•1h ago
I'm a solo dev; 6 months in the making. I built a financial context API that returns pre-analyzed market briefings for AI agents (on-chain, macro, regime classification), then ran 7 controlled experiments to find out if it actually helps or just adds noise.

I kept seeing the same pattern in AI agent demos. You hand an LLM a price feed, it gets {"price": 94200, "change_24h": -2.3}, and it burns half its context window figuring out basics. Is this up from last week? What percentile? How does hash rate correlate? The agent does all that work before it starts reasoning about what to do. So I started pre-computing the analysis server-side and returning ~400 token markdown briefings instead of raw JSON.

The experiment: 4-arm RCT. Treatment gets real-time briefings. Control gets price only. A third arm uses web search instead of briefings. Placebo gets the same briefings but time-shifted 5-7 months, presented as current. All arms run Claude, one trading decision per tick.

Latest run, 202 ticks over 6 months. BTC fell 34.7%.

  Treatment (briefings):   +7.83%  | max drawdown 5.95%
  Control (price only):    -8.14%  | max drawdown 15.95%
  Web search arm:          -1.55%  | max drawdown 12.63%
  Placebo (stale data):    -7.70%  | max drawdown 10.17%
  BTC buy-and-hold:       -34.70%
Treatment beat control by +15.97pp. Beat web search by +9.38pp. All 7 experiments positive, range +4.46pp to +15.97pp across two models (Opus 4.6, Sonnet 4.5).

The edge is almost entirely defensive. Treatment's return came from two short campaigns during crashes. In rallies and sideways markets, it matched or underperformed control. Long trades were coin flips.

What didn't work: the earliest run was the worst. Treatment finished last. Rich data with no guardrails caused the agent to flip-flop every tick. BUY, SELL, BUY across three consecutive ticks. $79K traded, zero net position change. A later run was aborted at tick 33 after the agent translated "macro bearish" into "go short" when the right move was cash. 1 of 24 total runs was negative. 5 were inconclusive.

Stale data was worse than no data. Placebo consistently underperformed plain control across runs. Well-structured wrong information is more dangerous than no information.

Things I'm still uncertain about: the edge is untested in a bull market (every window skews bearish), 202 ticks isn't statistically conclusive within a single run (more valued would be years of data/ticks), and the web search arm had contamination risk from future-dated search results.

Kreuzbery – Fast RAG Pipeline

https://kreuzberg.dev
1•zlu•43s ago•1 comments

Show HN: Reading Tree, a weighted outline for articles instead of a summary

https://github.com/ModelVoyager/ReadingTree
1•ModelVoyager•2m ago•0 comments

Show HN: 3 out of 4 devs failed to catch dangerous AI-suggested commands

https://agentsaegis.com/assessment
1•abdullaachilov•3m ago•0 comments

The story of FFmpeg (and how it ended up everywhere)

https://roughcut.heyeddie.ai/p/the-video-software-that-powers-the
1•shamirallibhai•5m ago•1 comments

Developer Experience

https://leerob.com/dx
1•mmarian•5m ago•0 comments

Show HN: We caught our AI agents self-approving their own work, so we built this

https://github.com/neo4j-labs/ai-governor
1•june-jule•5m ago•0 comments

A Fixation and Distance-Dependent Color Illusion

https://arxiv.org/abs/2509.11582
2•geox•6m ago•0 comments

Agentis – multi-agent AI platform across 12 LLM providers, watch them in 3D

https://github.com/Dhwanil25/Agentis
1•Dhwanil25•6m ago•0 comments

What breaks in AI agent commerce (300 sessions, WooCommerce)

https://dev.to/zologic/woocommerce-just-did-what-shopify-did-hours-later-open-protocol-full-auton...
1•Zologic•7m ago•0 comments

The Sudden Death of a Man Who Told Chinese Kids How to Succeed

https://www.nytimes.com/2026/03/26/world/asia/chinese-influencer-zhang-death.html
1•mitchbob•7m ago•1 comments

Ten Months with Copilot Coding Agent in Dotnet/Runtime

https://devblogs.microsoft.com/dotnet/ten-months-with-cca-in-dotnet-runtime/
1•maltalex•8m ago•0 comments

Auditing source code wasn't enough in the LiteLLM attack

https://blog.mozilla.ai/hardening-your-llm-dependency-supply-chain/
1•angpt•8m ago•0 comments

Palantir's CEO says only two kinds of people will succeed in the AI era

https://fortune.com/2026/03/24/palantir-ceo-alex-karp-two-people-successful-in-ai-era-vocational-...
1•mldev_exe•8m ago•0 comments

Mymarks.net

https://mymarks.net/
1•shozzipen•9m ago•0 comments

Iran War Is Pushing Consumers to Break Up with Fossil Fuels

https://www.bloomberg.com/news/features/2026-03-26/war-oil-price-shock-sparks-new-interest-in-gre...
2•toomuchtodo•10m ago•1 comments

The Old Internet Is Still Here

https://tylergaw.com/blog/the-old-internet-is-still-here/
1•speckx•11m ago•0 comments

What AI tools to use for iOS development

https://onmyway133.com/posts/what-ai-tools-to-use-for-ios-development/
1•vinhnx•11m ago•0 comments

"Am I Actually Doing a Good Job?"

https://k2xl.substack.com/p/am-i-actually-doing-a-good-job
1•k2xl•11m ago•0 comments

The Sparsity Nexus: Bypassing O(N²) Attention with Judy Arrays

https://axwise.de/blog/sparsity-nexus-judy-arrays-ai-attention.html
2•lunarain•12m ago•0 comments

The "Me" Decade and the Third Great Awakening

https://nymag.com/article/tom-wolfe-me-decade-third-great-awakening.html
1•tolerance•12m ago•0 comments

Hot things can freeze faster than cool ones. Now, this paradox has gone quantum

https://www.science.org/content/article/hot-things-can-freeze-faster-cool-ones-now-paradox-has-go...
1•Brajeshwar•12m ago•0 comments

Cory Doctorow: Interoperability Can Save the Open Web

https://spectrum.ieee.org/doctorow-interoperability
3•janandonly•12m ago•0 comments

LeWorldModel: Stable E2E Joint-Embedding Predictive Architecture from Pixels

https://arxiv.org/abs/2603.19312
1•mpweiher•13m ago•0 comments

Don't Trust, Verify

https://daniel.haxx.se/blog/2026/03/26/dont-trust-verify/
1•SEJeff•13m ago•0 comments

Show HN: Codeseum – From Bare Metal to Pure Thought

https://codeseum.tyku8.com/spectrum
1•posiunas•14m ago•0 comments

Show HN: What Did My Agent Do? Compare logs to signed records

https://whatdidmyagentdo.com
1•jithinraj•14m ago•0 comments

Ask HN: Do your coworkers review their own AI generated code?

1•codechicago277•14m ago•0 comments

TweetStyler – Style your X posts with 125 Unicode fonts (no extension needed)

https://www.tweetstyler.com/
1•Rashka7•14m ago•1 comments

Personal Identification Secure Comparison and Evaluation System

https://en.wikipedia.org/wiki/PISCES
1•harporoeder•18m ago•0 comments

Time-lapse of continental drift over the last 750M years

https://observablehq.com/@neocartocnrs/continental-drift
1•alphabetatango•18m ago•0 comments