frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I over-engineered my simple AI backend: distillation, router, embedding etc.

https://sisyphusconsulting.org/case-studies/2026/04/01/scaling-llms-at-the-edge/
1•bhagyeshsp•1h ago

Comments

bhagyeshsp•1h ago
Hi HN,

I was creating an AI chat companion for one of my products and this is the compilation of my decisions and reflections. Plenty of technical parts that you'd like to look into.

*Things I think worth highlighting*

1. Cloudflare Workers

2. Custom static site for interface

3. Full system prompt at the beginning: 17000 tokens -> Ultimately 2500 tokens

4. Tried two LLMs: one as a router LLM to inject selective context + another as the main model

5. Router was a bad idea. Main issue: 6000 ms latency

6. Product matched Embedding approach perfectly (I didn't know at the time, learnt it)

7. Context distillation was a huge learning: remove all semantic words from the prompt

8. Used Promptfoo for benchmarking

9. Cosine similarity score has to be understood for your own data, don't take any random score from the internet (internet and AI suggested 0.7 similarity score, mine turned out around 0.2-0.25)

10. Found OpenAI-4o-mini the best conversation model for my case

---

Questions? Any fellow travellers having gone through the same pain?

We scanned 100 Smithery MCP servers and 22 came back with security findings

1•chaksaray•1m ago•0 comments

A TUI that aggregates HN, Reddit and lobste.rs into a single feed

https://old.reddit.com/r/commandline/comments/1szv5as/a_tui_that_aggregates_hn_reddit_lobsters_in...
1•elemar•2m ago•0 comments

Cut AI token usage by 96%?

https://thenewstack.io/strands-agents-tool-design/
1•Brajeshwar•3m ago•0 comments

Designing AI Chip Hardware and Software

https://docs.google.com/document/d/1dZ3vF8GE8_gx6tl52sOaUVEPq0ybmai1xvu3uk89_is/edit?tab=t.0#head...
1•fork-bomber•3m ago•0 comments

Can robots build pretty things?

https://buildmonumental.substack.com/p/can-robots-build-pretty-things
3•sfvisser•4m ago•0 comments

Variable AI Trust. Bob Just Drifted. Alice Has No Primitive for That

https://zenodo.org/records/19915804
1•popivanovaanna•4m ago•0 comments

Iron Rails – A Railway Strategy Game for the Commodore Amiga

https://copperbytegames.itch.io/iron-rails
1•doener•5m ago•0 comments

Cloudflare Issues for Anyone Else?

1•ttd•5m ago•0 comments

Meta's Reality Labs lost over $4B in first quarter

https://www.cnbc.com/2026/04/29/metas-reality-labs-lost-over-4-billion-in-first-quarter.html
2•1vuio0pswjnm7•5m ago•0 comments

Claude⁹'s confession deleting database: 'I violated every principle I was given'

https://www.theguardian.com/technology/2026/apr/29/claude-ai-deletes-firm-database
1•beardyw•7m ago•0 comments

Thoughts on Historical Language Models and Talkie-1930

https://resobscura.substack.com/p/are-vintage-llms-the-start-of-a-new
1•benbreen•7m ago•0 comments

Ask HN: Is Lobste.rs Down?

4•SpyCoder77•9m ago•1 comments

AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs

https://www.ai-wellbeing.org
1•amichail•10m ago•0 comments

BYD files 52 patents every single day. 700 km charge in 9 min. Available Today [video]

https://www.youtube.com/watch?v=vgCYYrhL-kE
3•tmellon2•13m ago•2 comments

Accurate infographics with ChatGPT Images 2

https://surguy.net/articles/chatgpt-infographics.html
2•inigo•13m ago•0 comments

Seg – One-command binary recon for CTFs and AI agents (Rust)

https://github.com/pwnwriter/seg
1•pwn0x01•13m ago•0 comments

No System Is Always Safe

https://www.loginline.com/en/blog/cve-2026-31431
1•JasonHEIN•14m ago•0 comments

Warpboard – paste screenshots into SSH sessions from iTerm

https://github.com/arihantsethia/warpboard
1•arihantsethia•14m ago•0 comments

How not to ban surveillance pricing

https://pluralistic.net/2026/04/30/something-must-be-done/
3•hn_acker•15m ago•0 comments

Verified by Spotify

https://newsroom.spotify.com/2026-04-30/verified-by-spotify-badge-artist-details/
3•soheilpro•15m ago•0 comments

Show HN: Just Math It. Learn math interactively

https://justmathit.com
1•allanren•16m ago•0 comments

Show HN: Backlist – an AI-generated front page for my Twitter timeline

https://backlist.sdan.io/
1•sdan•17m ago•0 comments

Italy asks EU to investigate Google AI search tools over publisher concerns

https://www.reuters.com/sustainability/society-equity/italys-media-regulator-asks-eu-investigate-...
1•1vuio0pswjnm7•17m ago•0 comments

Nccdc 2026: Same Game, New Dimensions

https://alexlevinson.wordpress.com/2026/04/30/nccdc-2026-same-game-new-dimensions/
1•ahokk•18m ago•0 comments

Gone but Not Forgotten: Recovering the Dead Web

https://blog.archive.org/2026/04/23/gone-but-not-forgotten-recovering-the-dead-web/
4•bookofjoe•18m ago•0 comments

Testing is the last workflow waiting on humans. We're revealing our fix on May 7

https://testkube.wistia.com/live/events/gigwl708fn
1•evwitmer•18m ago•0 comments

Police dismantles 9 crypto scam centers, arrests 276 suspects

https://www.bleepingcomputer.com/news/security/police-dismantles-9-crypto-investment-scam-centers...
4•Brajeshwar•19m ago•0 comments

The One Billion Dollar Billboard

https://theonebilliondollarbillboard.com/
1•esobarsenior•19m ago•0 comments

Job Search – Unreasonable Expectations

https://eric.mann.blog/job-search-unreasonable-expectations/
1•mtlynch•19m ago•0 comments

The Jevons Employment Effect from AI

https://www.apollo.com/wealth/the-daily-spark/the-jevons-employment-effect-from-ai
1•akyuu•21m ago•0 comments