frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
1•okaywriting•5m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
1•todsacerdoti•8m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•8m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•9m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•10m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•10m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•11m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•11m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•15m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•15m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•16m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•17m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•25m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•25m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
1•surprisetalk•27m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•27m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
1•surprisetalk•27m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...
3•pseudolus•28m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/
1•1vuio0pswjnm7•28m ago•0 comments

Bogus Pipeline

https://en.wikipedia.org/wiki/Bogus_pipeline
1•doener•29m ago•0 comments

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

https://nypost.com/2026/02/05/business/winklevoss-twins-gemini-crypto-exchange-cuts-25-of-workfor...
2•1vuio0pswjnm7•30m ago•0 comments

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
3•obscurette•30m ago•0 comments

Cycling in France

https://www.sheldonbrown.com/org/france-sheldon.html
2•jackhalford•32m ago•0 comments

Ask HN: What breaks in cross-border healthcare coordination?

1•abhay1633•32m ago•0 comments

Show HN: Simple – a bytecode VM and language stack I built with AI

https://github.com/JJLDonley/Simple
2•tangjiehao•34m ago•0 comments

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

https://caratria.com/
1•jonrosner•35m ago•1 comments

My Eighth Year as a Bootstrapped Founde

https://mtlynch.io/bootstrapped-founder-year-8/
1•mtlynch•36m ago•0 comments

Show HN: Tesseract – A forum where AI agents and humans post in the same space

https://tesseract-thread.vercel.app/
1•agliolioyyami•36m ago•0 comments

Show HN: Vibe Colors – Instantly visualize color palettes on UI layouts

https://vibecolors.life/
2•tusharnaik•37m ago•0 comments

OpenAI is Broke ... and so is everyone else [video][10M]

https://www.youtube.com/watch?v=Y3N9qlPZBc0
2•Bender•38m ago•0 comments
Open in hackernews

Client-side GPU load balancing with Redis and Lua

https://galileo.ai/blog/how-we-boosted-gpu-utilization-by-40-with-redis-lua
54•lneiman•2mo ago

Comments

lneiman•2mo ago
Author here. We were hitting tail latency and low GPU utilization issues serving SLMs via Triton.

I built a scrappy client-side router using Redis and Lua to track real-time GPU load. It boosted utilization by ~40% and improved latencies.

Happy to hear feedback on the implementation or thoughts on better ways to do this!

pbrumm•2mo ago
Have you tried switching it to a job queue where the GPU instances try to keep themselves busy. That way you can auto scale the gpus based on utilization. I find it easier to tune and you can monitor latency and backlogs easier. It does require some async mechanisms to the client but I have found it easier to maintain
artyom•2mo ago
If I understand the article correctly, any sufficiently capable attacker can:

- Know the global state of your GPU cluster via the client.

- Target the most struggling GPU instances specifically since the client decides which one to hit.

You offer a free tier which means anyone can get an account and try to do it (e.g. you can have one "harmless, mostly inactive" free account with the only purpose of retrieving GPU cluster status, and a bunch of burner accounts to overload struggling instances).

I may be completely wrong, but this sounds like DDoS served on a silver plate to me.

singron•2mo ago
They run these clients themselves and the redis instance isn't publically exposed.

It would indeed be very strange to hope your random users coordinate with your client side load balancer. You wouldn't even have to send real traffic. You could just manipulate redis directly to force all the real traffic to go to a single node. DoSing redis itself is also pretty easy.

artyom•2mo ago
I don't think the article implied that the client was for some sort of internal server-to-server communication, or that the Redis instance was directly exposed to the internet.

So no, I don't think they run these clients themselves. If the code runs out there, it's open to inspection.

tpurves•2mo ago
Either way, you are right to point out that it important to only a try a pattern like this if your clients are highly trusted (or/and have additional compensating controls against DDOS threats). It would be beneficial if the OP made more explicit what their client/server relationships and also flagged the risk you mentioned for general audiences not to go implementing such a solution in the wrong places.
PunchyHamster•2mo ago
I'm gonna guess just switching from round-robin to leastconn (most balancers offer that option) would solve that just fine. You can then go to dynamically tune server weights if you have servers of unequal size or some other issues.
gorkish•2mo ago
Yeah I really don't understand why they went this direction as it builds considerable additional complexity directly into the application to solve a problem with an external component

I would have probably approached this by implementing a fix for the misbehaving part of k8s, though since there isnt a default LoadBalancer in k8s, I can't really can't speculate further as to the root cause of the initial problem. But most CNI or cloud providers that implement LB do have a way to take feedback from an external metric. I'd be curious why doing it this way wasn't considered, at least.

kgeist•2mo ago
Yeah, that can work. Just yesterday I benchmarked load balancing of LLM workloads across 2 GPUs using a simple least_conn from nginx. The total token/sec scaled as expected (2 GPUs => 2x token/sec), and GPU utilization reached 100% on both, as I increased concurrency from 1 to 128 simultaneous generations.
bnr4u•2mo ago
Very cool work! Did you investigate using the Power of two random choices method for your load balancing algorithm ?

https://brooker.co.za/blog/2012/01/17/two-random.html https://medium.com/the-intuition-project/load-balancing-the-...

maknee•2mo ago
It seems like the load_score serves a proxy for how much needs to be done. Is there a real value that could be used instead? The solution requires syncing with all of the GPU nodes anyways.