frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Are AI agents ready for the workplace? A new benchmark raises doubts

https://techcrunch.com/2026/01/22/are-ai-agents-ready-for-the-workplace-a-new-benchmark-raises-do...
1•PaulHoule•1m ago•0 comments

Show HN: AI Watermark and Stego Scanner

https://ulrischa.github.io/AIWatermarkDetector/
1•ulrischa•2m ago•0 comments

Clarity vs. complexity: the invisible work of subtraction

https://www.alexscamp.com/p/clarity-vs-complexity-the-invisible
1•dovhyi•3m ago•0 comments

Solid-State Freezer Needs No Refrigerants

https://spectrum.ieee.org/subzero-elastocaloric-cooling
1•Brajeshwar•3m ago•0 comments

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

1•mc-0•4m ago•1 comments

From Zero to Hero: A Brief Introduction to Spring Boot

https://jcob-sikorski.github.io/me/writing/from-zero-to-hello-world-spring-boot
1•jcob_sikorski•4m ago•0 comments

NSA detected phone call between foreign intelligence and person close to Trump

https://www.theguardian.com/us-news/2026/feb/07/nsa-foreign-intelligence-trump-whistleblower
4•c420•5m ago•0 comments

How to Fake a Robotics Result

https://itcanthink.substack.com/p/how-to-fake-a-robotics-result
1•ai_critic•5m ago•0 comments

It's time for the world to boycott the US

https://www.aljazeera.com/opinions/2026/2/5/its-time-for-the-world-to-boycott-the-us
1•HotGarbage•6m ago•0 comments

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

https://jslambda.github.io/tldr-vsearch/
1•jslambda•6m ago•1 comments

The AI CEO Experiment

https://yukicapital.com/blog/the-ai-ceo-experiment/
2•romainsimon•7m ago•0 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
3•surprisetalk•11m ago•0 comments

MS-DOS game copy protection and cracks

https://www.dosdays.co.uk/topics/game_cracks.php
3•TheCraiggers•12m ago•0 comments

Updates on GNU/Hurd progress [video]

https://fosdem.org/2026/schedule/event/7FZXHF-updates_on_gnuhurd_progress_rump_drivers_64bit_smp_...
2•birdculture•13m ago•0 comments

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

https://xcancel.com/search?f=tweets&q=davenewworld_2%2Fstatus%2F2020128223850316274
7•doener•13m ago•2 comments

MyFlames: Visualize MySQL query execution plans as interactive FlameGraphs

https://github.com/vgrippa/myflames
1•tanelpoder•14m ago•0 comments

Show HN: LLM of Babel

https://clairefro.github.io/llm-of-babel/
1•marjipan200•15m ago•0 comments

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

https://github.com/lance0/xfr
3•tanelpoder•16m ago•0 comments

Famfamfam Silk icons – also with CSS spritesheet

https://github.com/legacy-icons/famfamfam-silk
1•thunderbong•16m ago•0 comments

Apple is the only Big Tech company whose capex declined last quarter

https://sherwood.news/tech/apple-is-the-only-big-tech-company-whose-capex-declined-last-quarter/
2•elsewhen•20m ago•0 comments

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
2•todsacerdoti•21m ago•0 comments

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

https://github.com/yupme-bot/kernel-ndjson-proofs
1•Slaine•25m ago•0 comments

The Greater Copenhagen Region could be your friend's next career move

https://www.greatercphregion.com/friend-recruiter-program
2•mooreds•25m ago•0 comments

Do Not Confirm – Fiction by OpenClaw

https://thedailymolt.substack.com/p/do-not-confirm
1•jamesjyu•25m ago•0 comments

The Analytical Profile of Peas

https://www.fossanalytics.com/en/news-articles/more-industries/the-analytical-profile-of-peas
1•mooreds•26m ago•0 comments

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

https://jobswithgpt.com/blog/llm-eval-hallucinations-t20-cricket/
1•sp1982•26m ago•0 comments

What AI is good for, according to developers

https://github.blog/ai-and-ml/generative-ai/what-ai-is-actually-good-for-according-to-developers/
1•mooreds•26m ago•0 comments

OpenAI might pivot to the "most addictive digital friend" or face extinction

https://twitter.com/lebed2045/status/2020184853271167186
1•lebed2045•27m ago•2 comments

Show HN: Know how your SaaS is doing in 30 seconds

https://anypanel.io
1•dasfelix•27m ago•0 comments

ClawdBot Ordered Me Lunch

https://nickalexander.org/drafts/auto-sandwich.html
3•nick007•28m ago•0 comments
Open in hackernews

Show HN: Stateful LLM inference (no cost for input tokens, not prompt-caching)

2•arkonrad•5mo ago
Hi HN,

I’ve been frustrated for a while with how LLM inference works in the cloud today. Every API call starts from scratch: you resend your entire prompt + conversation history, and you’re charged for every input token, even if the model has already “seen” that context before.

This leads to two big problems:

Performance & cost – constantly resending input tokens is wasteful.

Quality loss – because the state is rebuilt on a new GPU each time, the model loses a lot of internal context beyond just your text.

Most “optimizations” offered in the industry are really just prompt-caching. That’s useful for cutting repeated input costs, but we’ve all seen the side-effects: outputs that don’t match subtle variations in the prompt, or the model confidently “jumping” to the wrong cached response because it thought your query was a near-duplicate.

We’re taking a different approach with ark-labs.cloud:

True stateful inference – when you start a session, all requests are processed on the same set of GPUs, and the full internal state of the model (prompt, history, reasoning traces) is preserved between calls.

Zero input token cost – because the model doesn’t need you to resend your input on each request. You pay only for generated output.

Better responses, not just cheaper ones – maintaining the internal state can improve consistency and reasoning quality, not just save money.

From a developer perspective, it’s simple: enable cookies, and the API will keep a session alive (ark_session_id). No SDK magic, no hacks. Sessions do expire after inactivity to free resources, but while they’re active, you’re talking to a model that actually remembers internally, not just through string concatenation of prompts.

Docs https://ark-labs.cloud/documentation/

We’d love your thoughts — especially from those who’ve wrestled with the “why am I paying 10x for tokens I already sent” problem, or who’ve hit caching systems that mismatched prompts to outputs. Does this approach make sense to you?

Comments

NitpickLawyer•5mo ago
> Most “optimizations” offered in the industry are really just prompt-caching. That’s useful for cutting repeated input costs, but we’ve all seen the side-effects: outputs that don’t match subtle variations in the prompt, or the model confidently “jumping” to the wrong cached response because it thought your query was a near-duplicate.

Perhaps you misspoke / misquoted some internal copy, but that doesn't mean what you think it means, and "caching" in kv caching doesn't mean what you imply it means here. The model doesn't "jump" on anything because of kv caching.

> From a developer perspective, it’s simple: enable cookies, and the API will keep a session alive

How is this related to LLM inference?! What are cookies doing there? What?

(from your docs) > OpenAI optimizes by processing every single request on randomly selected GPUs - but in the process most of the state is lost because only the final assistant reply is kept. Ark allows users to have a session during which all requests are processed on the same set of GPUs and the full internal state is maintained between requests. Depending on use case, this approach can improve both model's response quality and performance.

Yeah, except no. Every model builder so far has emphasised that this is not how you want to do it. With "thinking" models, you want to NOT include thinking steps for earlier messages, since that degrades the models outputs.

----

If you want to convince people about a better way of doing things, when the entire industry is doing another thing, you have to come up with data supporting your stance. Can you show such data? Do you have qualitative studies / benchmarks on your methods? Can you show that whatever state you hold is actually helping? That would go against the current practices of every inference engine out there currently, so it would be quite a thing to show.

arkonrad•5mo ago
On cookies: we use an HTTP cookie (ark_session_id) purely as an opaque session identifier. The cookie is how the client ties subsequent requests to the same pinned session/worker/GPUs on the provider side so the provider can keep the model activations/state in GPU memory between calls. Not a magic for the model; it’s a routing key that enables true session affinity.

On “thinking steps” and contamination: good point - naively persisting raw chain-of-thought tokens can degrade outputs. ARKLABS Stateful approach is not a blanket “store everything” policy.

And my criticism targets higher-level provider practices: things like response caching, aggressive prompt-matching / deduplication heuristics, or systems that return previously generated outputs when a new prompt is “similar enough.” Those high-level caches absolutely can produce the behaviour I described - a subtle prompt change that nevertheless gets routed to a cached reply.

The platform has been launched — we’re collecting data, but early results are very promising: we’re seeing linear complexity, lower latency, and ~80% input-token savings. At the same time we’d love to hear more feedback on whether this approach could be useful in real-world projects.

And about going against the grain, as you mentioned at the end… well — if startups didn’t think differently from everyone else, what would be the point of being a startup?