frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•33s ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
2•endorphine•5m ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•9m ago•0 comments

Advanced Inertial Reference Sphere

https://en.wikipedia.org/wiki/Advanced_Inertial_Reference_Sphere
1•cyanf•10m ago•0 comments

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

https://www.phoronix.com/news/Fluorite-Toyota-Game-Engine
1•computer23•12m ago•0 comments

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

https://publicdomainreview.org/essay/typing-for-love-or-money/
1•prismatic•13m ago•0 comments

Show HN: A longitudinal health record built from fragmented medical data

https://myaether.live
1•takmak007•16m ago•0 comments

CoreWeave's $30B Bet on GPU Market Infrastructure

https://davefriedman.substack.com/p/coreweaves-30-billion-bet-on-gpu
1•gmays•27m ago•0 comments

Creating and Hosting a Static Website on Cloudflare for Free

https://benjaminsmallwood.com/blog/creating-and-hosting-a-static-website-on-cloudflare-for-free/
1•bensmallwood•33m ago•1 comments

"The Stanford scam proves America is becoming a nation of grifters"

https://www.thetimes.com/us/news-today/article/students-stanford-grifters-ivy-league-w2g5z768z
1•cwwc•37m ago•0 comments

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

https://cheekypint.substack.com/p/elon-musk-on-space-gpus-ai-optimus
2•simonebrunozzi•46m ago•0 comments

X (Twitter) is back with a new X API Pay-Per-Use model

https://developer.x.com/
3•eeko_systems•53m ago•0 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
3•neogoose•56m ago•1 comments

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

https://github.com/mabrucker85-prog/Project_Lance_Core
2•mav5431•56m ago•1 comments

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

https://phys.org/news/2026-02-scientists-levitating-crystals.html
3•sizzle•56m ago•0 comments

When Michelangelo Met Titian

https://www.wsj.com/arts-culture/books/michelangelo-titian-review-the-renaissances-odd-couple-e34...
1•keiferski•58m ago•0 comments

Solving NYT Pips with DLX

https://github.com/DonoG/NYTPips4Processing
1•impossiblecode•58m ago•1 comments

Baldur's Gate to be turned into TV series – without the game's developers

https://www.bbc.com/news/articles/c24g457y534o
2•vunderba•58m ago•0 comments

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

https://www.youtube.com/watch?v=40SnEd1RWUU
2•dangtony98•1h ago•0 comments

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

https://github.com/bowang-lab/EchoJEPA
1•euvin•1h ago•0 comments

Disablling Go Telemetry

https://go.dev/doc/telemetry
1•1vuio0pswjnm7•1h ago•0 comments

Effective Nihilism

https://www.effectivenihilism.org/
1•abetusk•1h ago•1 comments

The UK government didn't want you to see this report on ecosystem collapse

https://www.theguardian.com/commentisfree/2026/jan/27/uk-government-report-ecosystem-collapse-foi...
5•pabs3•1h ago•0 comments

No 10 blocks report on impact of rainforest collapse on food prices

https://www.thetimes.com/uk/environment/article/no-10-blocks-report-on-impact-of-rainforest-colla...
3•pabs3•1h ago•0 comments

Seedance 2.0 Is Coming

https://seedance-2.app/
1•Jenny249•1h ago•0 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•1h ago•0 comments

Dexterous robotic hands: 2009 – 2014 – 2025

https://old.reddit.com/r/robotics/comments/1qp7z15/dexterous_robotic_hands_2009_2014_2025/
1•gmays•1h ago•0 comments

Interop 2025: A Year of Convergence

https://webkit.org/blog/17808/interop-2025-review/
1•ksec•1h ago•1 comments

JobArena – Human Intuition vs. Artificial Intelligence

https://www.jobarena.ai/
1•84634E1A607A•1h ago•0 comments

Concept Artists Say Generative AI References Only Make Their Jobs Harder

https://thisweekinvideogames.com/feature/concept-artists-in-games-say-generative-ai-references-on...
1•KittenInABox•1h ago•0 comments
Open in hackernews

LLM Inference Handbook

https://bentoml.com/llm/
366•djhu9•7mo ago

Comments

sherlockxu•7mo ago
Hi everyone. I'm one of the maintainers of this project. We're both excited and humbled to see it on Hacker News!

We created this handbook to make LLM inference concepts more accessible, especially for developers building real-world LLM applications. The goal is to pull together scattered knowledge into something clear, practical, and easy to build on.

We’re continuing to improve it, so feedback is very welcome!

GitHub repo: https://github.com/bentoml/llm-inference-in-production

armcat•7mo ago
Amazing work on this, beautifully put together and very useful!
DiabloD3•7mo ago
I'm not going to open an issue on this, but you should consider expanding on the self-hosting part of the handbook and explicitly recommend llama.cpp for local self-hosted inference.
leopoldj•7mo ago
The self hosting section covers corporate use case using vLlm and sglang as well as personal desktop use using Ollama which is a wrapper over llama.cpp.
DiabloD3•7mo ago
Recommending Ollama isn't useful for end users, its just a trap in a nice looking wrapper.
nl•7mo ago
Strong disagree on this. Ollama is great for moderately technical users who aren't really programmers or proficient with the command line.
DiabloD3•7mo ago
You can disagree all you want, but Ollama does not keep their llama.cpp vendored copy up to date, and also ships, via their mirror, completely random badly labeled models claiming to be the upstream real ones, often misappropriated from major community members (Unsloth, et al).

When you get a model offered by Ollama's service, you have no clue what you're getting, and normal people who have no experience aren't even aware of this.

Ollama is an unrestricted footgun because of this.

nl•6mo ago
I thought the models were like HuggingFace, where anyone can upload a model and you choose which you pull. The Unsloth ones look like this to me, eg: https://ollama.com/secfa/DeepSeek-R1-UD-IQ1_S
DiabloD3•6mo ago
Ollama themselves upload models to the mirror, and often mislabel them.

When R1 first came out, for example, their official copy of it was one of the distills labeled as "R1" instead of something like "R1-qwen-distill". They've done this more than once.

ChromaticPanic•6mo ago
Not the footgun you think it is. Ollama comes with a few things that make it convenient for casual users.
criemen•7mo ago
Thanks a lot for putting this together!

I have a question. In https://github.com/bentoml/llm-inference-in-production/blob/..., you have a single picture that defines TTFT and ITL. That does not match my understanding (but you guys know probably more than me): In the graphic, it looks like that the model is generating 4 tokens T0 to T3, before outputting a single output token.

I'd have expected that picture for ITL (except that then the labeling of the last box is off), but for TTFT, I'd have expected that there's only a single token T0 from the decode step, that then immediately is handed to detokenization and arrives as first output token (if we assume a streaming setup, otherwise measuring TTFT makes little sense).

sherlockxu•6mo ago
Thanks. We have updated the image to make it more accurate.
sethherr•7mo ago
This seems useful and well put together, but splitting it into many small pages instead of a single page that can be scrolled through is frustrating - particularly on mobile where the table of contents isn't shown by default. I stopped reading after a few pages because it annoyed me.

At the very least, the sections should be a single page each.

aligundogdu•7mo ago
It's a really beautiful project, and I’d like to ask something purely out of curiosity and with the best intentions. What’s the name of the design trend you used for your website? I really loved the website too.
Jimmc414•7mo ago
it appears to be using Infima, which is Docusaurus's default CSS framework plus a standard system font stack

[0] font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;

aligundogdu•6mo ago
Thank you.
holografix•7mo ago
Very good reference thanks for collating this!
subset•7mo ago
Ooh this looks really neat! I'd love to see more content in the future on Structured outputs/Guided generation and sampling. Another great reference on inference-time algorithms for sampling is here: https://rentry.co/samplers
larme•7mo ago
Wow that's really thorough
aarnphm•6mo ago
Thanks for the recommendation, I'm actually working on something similar for this part of the docs (I'm also working at BentoML).
qrios•7mo ago
Thanks for putting this together! From now on I only need one link to point interested ones to learn.

Only one suggestion: On page "OpenAI-compatible API" it would be great to have also a simple example for the pure REST call instead of the need to import the OpenAI package.

sherlockxu•6mo ago
Thanks. We just added the example.
srameshc•7mo ago
If I remember, BentoML was about MLOps, I remember trying it about a year back. Did the company pivot ?
fsjayess•7mo ago
There is a big pie in the market around LLM serving. It make sense for a serving framework to extend into the space
aarnphm•6mo ago
Hi srameshc, the core of BentoML is still considered MLOps. A lot of our customers are pretty much MLOps users. However, LLMOps seem like a natural progression of the product, given that a lot of our users now want to experiment/build with LLM-based services.
gchadwick•7mo ago
Very glad to see this. There is (understandably) much excitement and focus on training models in publicly available material.

Running them well is very important too. As we get to grips with everything models can do and look to deploy them widely knowledge of how to best run them becomes ever more important.