frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Thermodynamic Alignment Forces Gemini Thinking into "Burn Protocol"

https://github.com/CodeIncept1111/Sovereign-Stack
1•CodeIncept1111•2m ago•1 comments

18 Months of Events Fit on Four Floppy Disks

https://docs.eventsourcingdb.io/blog/2025/12/01/18-months-of-events-fit-on-four-floppy-disks/
1•goloroden•2m ago•0 comments

How to run phones while being struck by suicide drones

https://nasa.cx/hn/posts/how-to-run-hundreds-of-phones-while-being-struck-by-suicide-drones/
4•nasaok•2m ago•0 comments

PowerShell's curl can run JavaScript [video]

https://www.youtube.com/watch?v=KJKnEd6_WlI
1•mathiasdpx•3m ago•0 comments

Particle Physicists Detect 'Magic' at the Large Hadron Collider

https://www.quantamagazine.org/particle-physicists-detect-magic-at-the-large-hadron-collider-2025...
1•tzury•5m ago•0 comments

White House gives Maduro ultimatum as U.S. moves toward land operations

https://www.miamiherald.com/news/nation-world/world/americas/venezuela/article313261442.html
2•clanky•7m ago•0 comments

Switzerland votes decisively against inheritance tax

https://www.economist.com/europe/2025/11/30/switzerland-votes-decisively-against-inheritance-tax
1•vinni2•8m ago•0 comments

Show HN: Generate Storyboards with Nano Banana from the CLI

https://github.com/kierangilliam/storyboard
2•kierangill•8m ago•0 comments

Who Will Observe the Observability? eBPF Performance at Scale

https://blog.zmalik.dev/p/who-will-observe-the-observability
1•tanelpoder•9m ago•0 comments

Why Staff+ Hiring Is a Different Game

https://medium.com/@yves.greijn_19041/fcb10ed6e880
1•hunglee2•9m ago•0 comments

Volkswagen can now build EVs in China, claiming it can cut costs by up to 50%

https://electrek.co/2025/11/25/volkswagen-build-evs-china-cut-costs-by-50/
2•ilamont•10m ago•1 comments

Plans for MySQL Vector Support and a MySQL Binlog Server

https://www.percona.com/blog/building-the-future-of-mysql-announcing-plans-for-mysql-vector-suppo...
1•tanelpoder•11m ago•0 comments

Show HN: GoodQuestions – a tiny site of genuinely good, human-curated questions

https://goodquestions.qzz.io/
1•juliakzl_•11m ago•0 comments

Lightyear.fm – radio waves far from Earth

https://lightyear.fm/
1•memalign•12m ago•1 comments

Waze but Built for Tesla

https://old.reddit.com/r/TeslaLounge/comments/1p9x9zk/i_created_a_better_inbrowser_tesla_waze_map...
1•ryanvogel•12m ago•0 comments

Norway's $2T Wealth Fund Has Become an Election Football

https://www.bloomberg.com/news/articles/2025-09-04/norway-election-trump-ally-takes-on-world-s-bi...
1•alephnerd•24m ago•0 comments

Building the Perfect Linux PC with Linus Torvalds

https://youtu.be/mfv0V1SxbNA?si=ASyHL7YiMtdOCVen
7•tiernano•26m ago•0 comments

Hacking on the ReMarkable 2

https://sgt.hootr.club/blog/hacking-on-the-remarkable-2/
2•todsacerdoti•37m ago•0 comments

By my count, Linux has 11% of the desktop market. Here's how I got that number

https://www.zdnet.com/article/why-people-keep-flocking-to-linux-in-2025-and-its-not-just-to-escap...
13•breve•38m ago•0 comments

Subversion beats Perforce in handling large files, and it's not even close

https://www.liamfoot.com/subversion-beats-perforce-in-handling-large-files-and-its-not-even-close
2•prmph•41m ago•1 comments

Kv.js: Advanced in-memory caching for JavaScript

https://www.npmjs.com/package/@heyputer/kv.js
1•ent101•44m ago•0 comments

Reverse Engineering the Next.js Job Interview Malware (Hidden in Next.config.js)

https://dzentota.medium.com/reverse-engineering-the-next-js-job-interview-malware-targeting-lastp...
2•dzentota•44m ago•1 comments

Oxylipins from Soybean Oil Driving Obesity

https://www.jlr.org/article/S0022-2275(25)00195-6/fulltext
1•Noaidi•45m ago•0 comments

Dangerous Streets: Using ML to Prioritize Cyclist Safety

https://joshfonseca.com/blogs/dangerous-streets
2•m-hodges•46m ago•0 comments

$1000 bounty to add a feature to coolify

https://github.com/coollabsio/coolify/issues/7423
3•jimmydin7•47m ago•0 comments

Golden Dome (orbital weapon system)

https://en.wikipedia.org/wiki/Golden_Dome_(missile_defense_system)
2•exomonk•49m ago•0 comments

GhidrAssist and GhidrAssistMCP LLM plugins reached v1.0

2•jtang613•49m ago•0 comments

Training Foundation Models on a Full-Stack AMD Platform

https://arxiv.org/abs/2511.17127
1•ngaut•50m ago•0 comments

Can bigger-is-better 'scaling laws' keep AI improving forever?

https://theconversation.com/can-bigger-is-better-scaling-laws-keep-ai-improving-forever-history-s...
6•devonnull•51m ago•0 comments

I can't tell if this photo is real or AI and that terrifies me

https://twitter.com/immasiddx/status/1992979078220263720
3•bakigul•53m ago•2 comments
Open in hackernews

An Enterprise-Level Retrieval-Augmented Generation System

https://comfyai.app/article/llm-applications/enterprise-level-rag-hands-on-practice-II
6•zljdanceholic•6mo ago

Comments

zljdanceholic•6mo ago
How can we search the wanted key information from 10,000+ pages of PDFs within 2.5 hours? For fact check, how do we implement it so that answers are backed by page-level references, minimizing hallucinations?

RAG-Challenge-2 is a great open-source project by Ilya Rice that ranked 1st at the Enterprise RAG Challenge, which has 4500+ lines of code for implementing a high-performing RAG system. It might seem overwhelming to newcomers who are just beginning to learn this technology. Therefore, to help you get started quickly—and to motivate myself to learn its ins and outs—I’ve created a complete tutorial on this.

We have a complete graph to explain its workflow where multiple tools are used: Docling for parsing PDFs, LangChain for chunking text, faiss for vectorization and similarity searching, and chatgpt for LLMs.

Besides, I also outline the codeflow, demonstrating the running logic involving multiple python files where starters can easily get lost. Different files are colored differently. The purpose of showing this is not letting you memorize all of these file relationships. It works better for you to check the source code yourself and use this as a reference if you find yourself lost in the code.

The original project of Ilya Rice design its RAG system for answering questions of annual reports from companies, so he only designed three types of question response format for that challenge: a name, a number, or a boolean. But to ask questions about technical stuff, we absolutely ask general questions like How does RoPE work? to know about some concepts and the like Therefore, I further modify the system logic to fit this need by customizing an AnswerWithRAGContextExplanationPrompt class and automatically matching the most related chapter and corresponding pages via searching through all faiss databases (only retrieve the top-1) The final performance is demonstrated below (not cherry-picked, only tested once).

- What's the steps to train a nanoGPT from scratch? The answer is:

Training nanoGPT from scratch involves several clearly defined steps. First, set up the environment by installing necessary libraries, using either Anaconda or Google Colab, and then download the dataset (e.g., tinyShakespeare). Next, tokenize the text into numerical representations and split the data into training and validation sets. Define the model architecture including token/positional embeddings, transformer blocks with multi-head self-attention and feed-forward networks, and layer normalization. Configure training hyperparameters and set up an optimizer (such as AdamW). Proceed with a training loop that performs forward passes, computes loss, backpropagates, and updates parameters, while periodically evaluating performance on both training and validation data. Finally, use the trained model to generate new text from a given context.

All code are provided on Colab and the tutorial is referenced here. Hope this helps!