frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Ask HN: How to increase LLM inference speed?

3•InkCanon•10h ago
Hi HN,

I'm building software that has a very tight feedback loop with the user. One part involves a short (few hundred tokens) response from an LLM. By far this is the biggest UX problem - currently DeepSeek's total time taken can reach 10 seconds, which is horrific. Would it be possible to practically reduce the speed to maybe ~2 seconds? The LLM just asks to rephrase (while preserving meaning) of a short text, so it does not need to be SOTA. On the whole faster inference time is much more important.

Comments

cranberryturkey•10h ago
you need a faster GPU but that only works for self hosted LLMs (ie: ollama/huggingface)

Closed-eye pressure and movement monitoring via contact lens

https://www.nature.com/articles/s41378-025-00946-y
1•geox•1m ago•0 comments

Israel's Next-Gen Combat Vehicles in Gaza

https://www.carsandhorsepower.com/featured/the-10-deadliest-military-vehicles-deployed-by-israel-in-the-gaza-war
1•Anumbia•5m ago•0 comments

Electromechanical Atari Is a Steampunk Meccano Masterpiece

https://hackaday.com/2025/06/15/electromechanical-atari-is-a-steampunk-meccano-masterpiece/
1•_Microft•6m ago•0 comments

AI Is a New Computer

https://jeffhuber.substack.com/p/ai-is-a-new-computer
1•jeffchuber•11m ago•1 comments

Why it's nearly impossible to buy an original Bob Ross painting

https://thehustle.co/why-its-nearly-impossible-to-buy-an-original-bob-ross-painting
3•rmason•13m ago•0 comments

Insert Knob A in Hole B

https://en.wikipedia.org/wiki/Insert_Knob_A_in_Hole_B
1•Bluestein•16m ago•0 comments

Soon Your Orange Juice Will Have Even Less Real Orange in It

https://www.bloomberg.com/news/articles/2025-06-13/soon-your-orange-juice-will-have-even-less-fruit-in-it
2•JumpCrisscross•18m ago•0 comments

Goose Perception agents watching, listening, learning and acting

https://github.com/michaelneale/goose-perception
1•tzury•18m ago•0 comments

Sequence Diagram: Verifying a TLS Certificate with its Issuer

https://raw.githubusercontent.com/benjacksondev/digital-signatures-in-go/refs/heads/main/sequence-diagram.png
1•benjacksondev•19m ago•0 comments

IPOChatter: Track Prospective Tech IPOs

https://ipochatter.com
2•civilaircraft•21m ago•0 comments

The social benefits of "anti-social" punishment

https://www.sciencedirect.com/science/article/abs/pii/S1090513825000042
2•rntn•22m ago•0 comments

Twin – A Textmode WINdow Environment

https://github.com/cosmos72/twin
1•kim_rutherford•27m ago•0 comments

McKinsey Leans on AI to Make PowerPoints, Draft Proposals

https://www.bloomberg.com/news/articles/2025-06-02/mckinsey-leans-on-ai-to-make-powerpoints-faster-draft-proposals
2•JumpCrisscross•30m ago•0 comments

Raycast for Hyperactive AI Obsessives [video]

https://www.youtube.com/watch?v=8LB7e2tKWoI
1•intellectronica•31m ago•0 comments

Tell HN: I just made a first ever dollar on my SaaS

3•yu3zhou4•32m ago•1 comments

Pruning for Iceberg: 90% of an Iceberg Is Underwater

https://www.snowflake.com/en/engineering-blog/iceberg-data-pruning/
1•bensk1•36m ago•0 comments

Western Blot Protocol and Principles: A Comprehensive Guide

https://www.clyte.tech/post/a-comprehensive-guide-to-western-blotting-protocol-and-principles
1•mw2taba88•39m ago•1 comments

An Introduction to the Hieroglyphic Language of Early 1900s Train-Hoppers

https://www.openculture.com/2018/08/hobo-code-introduction-hieroglyphic-language-early-1900s-train-hoppers.html
2•squircle•39m ago•0 comments

How fast can the RPython GC allocate?

https://pypy.org/posts/2025/06/rpython-gc-allocation-speed.html
1•todsacerdoti•40m ago•0 comments

The "standard" car charger is usually overkill [video]

https://www.youtube.com/watch?v=W96a8svXo14
2•CaliforniaKarl•42m ago•0 comments

Iron nitride permanent magnets made with DIY ball mill [video]

https://www.youtube.com/watch?v=M6XIgdS1rzs
2•xqcgrek2•43m ago•0 comments

Show HN: Pipo360 – Generate production-ready back end APIs in 60 seconds with AI

https://pipo360.xyz
3•the_plug•49m ago•0 comments

RSS with an Expiration Date: The Temporary Obsessions Feed Reader

https://www.calishat.com/2025/06/11/rss-with-an-expiration-date-the-temporary-obsessions-feed-reader/
3•gnabgib•49m ago•0 comments

Learning (The Basics of) Nftables

https://ewpratten.com/blog/learning-nftables
3•ewpratten•50m ago•0 comments

OpenTelemetry for Go: Measuring the Overhead

https://coroot.com/blog/opentelemetry-for-go-measuring-the-overhead/
4•valyala•50m ago•0 comments

Assessing GPT Performance in a Proof-Based University-Level Course

https://arxiv.org/abs/2505.13664
3•okintheory•51m ago•0 comments

More accurate than lightning: Oxford's new quantum gate

https://www.sciencedaily.com/releases/2025/06/250610074301.htm
3•karlperera•52m ago•1 comments

Let the Model Write the Prompt: Using DSPy to Decouple Your Task from the LLM

https://www.dbreunig.com/2025/06/10/let-the-model-write-the-prompt.html
3•dbreunig•57m ago•0 comments

Show HN: Seastar – Build and dependency manager for C/C++ with Cargo's features

https://github.com/AI314159/Seastar
12•AI314159•58m ago•4 comments

Show HN: YoRecord – a free zero-login screen recorder that lives in the browser

https://yorecord.com/
3•drabekj•59m ago•0 comments