frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Vaping likely to cause cancer, new Australian review of evidence finds

https://www.rnz.co.nz/news/world/591096/vaping-likely-to-cause-cancer-new-australian-review-of-ev...
1•abawany•38s ago•0 comments

Q4_0 KV cache collapses 92.5% at 64K on DGX Spark GB10 (unified memory paradox)

https://github.com/Memoriant/dgx-spark-kv-cache-benchmark
1•memoriant•2m ago•0 comments

Don't sleep on your feedback infrastructure

https://www.lumeforms.com/audit
1•Akhilm6•2m ago•1 comments

Scientists uncovered the nutrients bees were missing – Colonies surged 15-fold

https://www.sciencedaily.com/releases/2026/03/260327000518.htm
3•apparent•5m ago•0 comments

Good CTE Bad CTE

https://boringsql.com/posts/good-cte-bad-cte/
1•x591•6m ago•0 comments

Claude Code v2.1.88 Sourcemap

https://unpkg.com/@anthropic-ai/claude-code@2.1.88/cli.js.map
2•PashKatel•8m ago•0 comments

How the Apple II created the core of personal computing

https://www.theverge.com/tech/900677/apple-ii-personal-computer
3•rbanffy•9m ago•0 comments

Write Native Web HTML and CSS for Roblox UI

https://twitter.com/xeno_mouse/status/2038759250453729685
1•bloxstack•14m ago•0 comments

Windows++: C++ Application Framework for Windows by Paul DiLascia

http://pauldilascia.com/wpp.htm
1•teleforce•26m ago•0 comments

A rational non-commutative 4×4x48 matrix multiplication algorithm

https://arxiv.org/abs/2603.18699
1•vulcanology•27m ago•0 comments

Hand-Drawn Style in PowerPoint

https://pptcrafter.wordpress.com/2019/09/24/hand-drawn-style-in-powerpoint/
2•teleforce•30m ago•0 comments

Corebot 26.03 released with Intel PantherLake support

https://blogs.coreboot.org/blog/2026/03/30/coreboot-26-03-release/
1•cromka•31m ago•0 comments

A group of flamingoes is called a 'flamboyance'

https://www.themarginalian.org/2026/03/30/flamingos-pink/
2•sparshselim•33m ago•0 comments

The Contribution Theatre Trap

https://www.nyn.me/blog/last-mile-fallacy
1•codevenn•33m ago•0 comments

Seeds of doubt:The dark side of an Italian energy giant's green jet fuel promise

https://www.politico.eu/article/seeds-of-doubt-the-dark-side-of-enis-green-jet-fuel-promise/
1•leonidasrup•45m ago•0 comments

Sandflare – I built a sandbox that launches AI agent VMs in ~300ms

2•ajaysheoran2323•47m ago•2 comments

We Used to Think Everybody Heard a Voice Inside Their Heads – But We Were Wrong

https://www.sciencealert.com/we-used-to-think-everybody-heard-a-voice-inside-their-heads-but-we-w...
1•Anon84•49m ago•1 comments

EchoMind – AI Voice Notes

https://apps.apple.com/us/app/echomind-ai-voice-notes/id6760428216
1•srsstyle•49m ago•0 comments

Office, Messaging and Verbs (2015)

https://www.ben-evans.com/benedictevans/2015/5/21/office-messaging-and-verbs
2•simonebrunozzi•51m ago•0 comments

Pgit: I Imported the Linux Kernel into PostgreSQL

https://oseifert.ch/blog/linux-kernel-pgit
2•ImGajeed76•54m ago•0 comments

Cape Town Seeks to Emulate India's High-End Outsourcing Push

https://www.bloomberg.com/news/articles/2026-03-31/cape-town-seeks-to-emulate-india-s-high-end-ou...
4•alephnerd•56m ago•0 comments

What is AX?

https://sumato.ai/posts/2026-03-31-what-is-ax.html
1•jasonmoo•58m ago•0 comments

Using AI to forcast success of active clinical trials

https://warpspeed.sh/
1•obventio56•58m ago•0 comments

Show HN: Raincast – Describe an app, get a native desktop app (open source)

https://github.com/tihiera/raincast
8•tito777•1h ago•7 comments

Show HN: How many seasons does Melbourne (VIC) have?

1•bluemetal•1h ago•0 comments

LipoVive (Investigated) Why 2026 Metabolic Science Is a Rethink on Weight Loss

https://www.morningstar.com/news/accesswire/1138075msn/lipovive-reviews-shocking-2026-report-what...
1•tagyhans•1h ago•0 comments

China factories log fastest growth in a year as war risks loom large

https://www.reuters.com/world/asia-pacific/chinas-factory-activity-returns-expansion-pmi-shows-20...
3•defrost•1h ago•0 comments

Call.md

https://github.com/video-db/call.md
1•handfuloflight•1h ago•0 comments

PyTorch Primer

https://bitwise.land/
2•jackomelon•1h ago•0 comments

Spacecraft Heat Shields Could Violently "Burst" in Alien Atmospheres

https://www.universetoday.com/articles/spacecraft-heat-shields-could-violently-burst-when-plungin...
1•gostsamo•1h ago•1 comments
Open in hackernews

An Enterprise-Level Retrieval-Augmented Generation System

https://comfyai.app/article/llm-applications/enterprise-level-rag-hands-on-practice-II
6•zljdanceholic•11mo ago

Comments

zljdanceholic•11mo ago
How can we search the wanted key information from 10,000+ pages of PDFs within 2.5 hours? For fact check, how do we implement it so that answers are backed by page-level references, minimizing hallucinations?

RAG-Challenge-2 is a great open-source project by Ilya Rice that ranked 1st at the Enterprise RAG Challenge, which has 4500+ lines of code for implementing a high-performing RAG system. It might seem overwhelming to newcomers who are just beginning to learn this technology. Therefore, to help you get started quickly—and to motivate myself to learn its ins and outs—I’ve created a complete tutorial on this.

We have a complete graph to explain its workflow where multiple tools are used: Docling for parsing PDFs, LangChain for chunking text, faiss for vectorization and similarity searching, and chatgpt for LLMs.

Besides, I also outline the codeflow, demonstrating the running logic involving multiple python files where starters can easily get lost. Different files are colored differently. The purpose of showing this is not letting you memorize all of these file relationships. It works better for you to check the source code yourself and use this as a reference if you find yourself lost in the code.

The original project of Ilya Rice design its RAG system for answering questions of annual reports from companies, so he only designed three types of question response format for that challenge: a name, a number, or a boolean. But to ask questions about technical stuff, we absolutely ask general questions like How does RoPE work? to know about some concepts and the like Therefore, I further modify the system logic to fit this need by customizing an AnswerWithRAGContextExplanationPrompt class and automatically matching the most related chapter and corresponding pages via searching through all faiss databases (only retrieve the top-1) The final performance is demonstrated below (not cherry-picked, only tested once).

- What's the steps to train a nanoGPT from scratch? The answer is:

Training nanoGPT from scratch involves several clearly defined steps. First, set up the environment by installing necessary libraries, using either Anaconda or Google Colab, and then download the dataset (e.g., tinyShakespeare). Next, tokenize the text into numerical representations and split the data into training and validation sets. Define the model architecture including token/positional embeddings, transformer blocks with multi-head self-attention and feed-forward networks, and layer normalization. Configure training hyperparameters and set up an optimizer (such as AdamW). Proceed with a training loop that performs forward passes, computes loss, backpropagates, and updates parameters, while periodically evaluating performance on both training and validation data. Finally, use the trained model to generate new text from a given context.

All code are provided on Colab and the tutorial is referenced here. Hope this helps!