frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Why Your RAG Costs $2,400/Month (and How We Cut It by 73%)

2•helain•1d ago
You're running RAG in production. Then the AWS bill lands. $2,400/month for 50 queries/day. $48 per query.

We built a RAG system for enterprise clients and realized most production RAGs are optimization disasters. The literature obsesses over accuracy while completely ignoring unit economics.

The Three Cost Buckets Vector Database (40-50% of bill) Standard RAG pipelines do 3-5 unnecessary DB queries per question. We were making 5 round-trips for what should've been 1.5.

LLM API (30-40%) Standard RAG pumps 8-15k tokens into the LLM. That's 5-10x more than necessary. We found: beyond 3,000 tokens of context, accuracy plateaus. Everything beyond that is noise and cost.

Infrastructure (15-25%) Vector databases sitting idle, monitoring overhead, unnecessary load balancing.

What Actually Moved the Needle Token-Aware Context (35% savings) Budget-based assembly that stops when you've used enough tokens. Before: 12k tokens/query. After: 3.2k tokens. Same accuracy.

python def _build_context(self, results, settings): max_tokens = settings.get("max_context_tokens", 2000) current_tokens = 0 for result in results: tokens = self.llm.count_tokens(result) if current_tokens + tokens <= max_tokens: current_tokens += tokens else: break Hybrid Reranking (25% savings) 70% semantic + 30% keyword scoring. Better ranking means fewer chunks needed. Top-20 → top-8 retrieval while maintaining quality.

Embedding Caching (20% savings) Workspace-isolated cache with 7-day TTL. We see 45-60% hit rate intra-day.

python async def set_embedding(self, text, embedding, workspace_id=None): key = f"embedding:ws_{workspace_id}:{hash(text)}" await redis.setex(key, 604800, json.dumps(embedding)) Batch Embedding (15% savings) Batch API pricing is 30-40% cheaper per token. Process 50 texts simultaneously instead of individu

1000 Days in 32:9

https://blog.paavo.me/1000-days-in-32-by-9/
1•paavohtl•1m ago•0 comments

Transformer Architecture Visualizer

https://weavers.neocities.org/architecture-encyclopedia/
1•rain1•2m ago•1 comments

My new killer SaaS (Script-as-a-Service) – safe-claude.com

https://safe-claude.com
1•emilss•6m ago•1 comments

LLM Reflexion meta Core vs ML

1•alexandrkul•6m ago•0 comments

Megabonk

1•dimastopel•14m ago•0 comments

Energy Revolution System

https://sites.google.com/view/energy-revolution-systems/
1•healthcareuss•16m ago•0 comments

Software Freedom Conservancy vs. Vizio Inc

https://sfconservancy.org/copyleft-compliance/vizio.html
2•pabs3•21m ago•0 comments

Waterfox 6.6.6 release notes: Privacy hardening

https://www.waterfox.com/releases/6.6.6/
3•thomassmith65•22m ago•1 comments

Photo of skydiver 'falling' past the sun's surface

https://www.livescience.com/space/the-sun/astrophotographer-snaps-absolutely-preposterous-photo-o...
1•Anon84•23m ago•0 comments

Humans rank above meerkats but below beavers in monogamy league table

https://www.theguardian.com/science/2025/dec/10/humans-rank-among-leading-monogamous-mammals-stud...
2•wjSgoWPm5bWAhXB•28m ago•1 comments

The Code That Revolutionized Orbital Simulation [video]

https://www.youtube.com/watch?v=nCg3aXn5F3M
1•RossBencina•32m ago•1 comments

The real lock-in in GitHub is not the code, but the stars

https://ashishb.net/tech/github-stars/
5•ashishb•34m ago•1 comments

'Food and fossil fuel production causing $5B of environmental damage an hour'

https://www.theguardian.com/environment/2025/dec/09/food-fossil-fuel-production-5bn-environmental...
1•robtherobber•46m ago•0 comments

Attitudes towards AI by country (2025)

https://old.reddit.com/r/dataisbeautiful/comments/1pl7xbe/oc_attitudes_towards_ai_by_country_2025/
1•embedding-shape•48m ago•1 comments

Pixel advertising wall for influencers with real-time reservation

https://influencerswall.com/
2•YurGrhm•49m ago•1 comments

Rich Headers: leveraging this mysterious artifact of the PE format

https://www.virusbulletin.com/virusbulletin/2020/01/vb2019-paper-rich-headers-leveraging-mysterio...
2•todsacerdoti•52m ago•0 comments

Building a Multiplayer Game with Polyglot Microservices: Architecture Lessons

https://gitlab.com/RobinTrassard/codenames-microservices/-/tree/account-java-version
1•birdculture•52m ago•0 comments

I was tired of removing video backgrounds, so I built a simpler solution

https://removebgvideo.com/
1•quchao•53m ago•2 comments

Unpredictable code behavior is a hidden driver of cloud waste

https://portugalstartupnews.com/2025/12/12/the-cloud-mistake-that-quietly-drains-startup-runway/
1•rodriguejr•54m ago•1 comments

Building Trustworthy AI Agents

https://www.schneier.com/blog/archives/2025/12/building-trustworthy-ai-agents.html
1•Garbage•54m ago•0 comments

'It May Be Worse'–No Fix for New Google Chrome Attacks

https://www.forbes.com/sites/zakdoffman/2025/12/12/it-may-be-worse-no-fix-for-new-threat-to-googl...
4•ColinWright•1h ago•0 comments

The simple cult camera that inspired Instagram (2017)

https://www.bbc.com/future/article/20171113-the-toy-camera-that-inspired-instagram
1•mastazi•1h ago•0 comments

Over 12,000 Startup Ideas Right Here

2•suhaspatil101•1h ago•1 comments

Clean, Limitless Energy Exists. China Is Going Big in the Race to Harness It

https://www.nytimes.com/2025/12/13/climate/china-us-fusion-energy.html
5•fleahunter•1h ago•0 comments

Get your social Media to explode

https://magiclip.io/article-subtitles.html
2•Sabr0•1h ago•1 comments

I think I might be done for a while

https://varunraghu.com/i-think-i-might-be-done-for-a-while/
4•Lagogarda•1h ago•2 comments

Can I use HTTPS RRs?

https://www.netmeister.org/blog/https-caniuse.html
1•fanf2•1h ago•0 comments

How much AI do we need, really?

https://newsletter.alastairrushworth.com/p/how-much-ai-do-we-need-really
2•alastairr•1h ago•0 comments

Tell HN: Cloudflare now censors Polymarket in Germany

3•baobabKoodaa•1h ago•3 comments

The secretive world of North Korean science fiction (2023)

https://arstechnica.com/culture/2023/08/the-strange-secretive-world-of-north-korean-science-fiction/
2•doener•1h ago•0 comments