frontpage.

How do we train and evaluate Search Agents?

I am SUPER EXCITED to publish a new episode of the Weaviate Podcast with Nandan Thakur on Search Agents!

Firstly, congratulations to Nandan who has just completed his Ph.D. at the University of Waterloo advised by Professor Jimmy Lin!

During this time he published several impactful works such as BEIR , MIRACL , FreshStack , and many more.

This podcast dives into his new work on ORBIT and the current state of Search Agents!

ORBIT contains 20K training examples, each one a complex, multi-hop question paired with a short verifiable answer. For example, "What was the runtime of the 2017 animated film set inside a smartphone, directed by..." (Answer: 86 minutes).

This dataset is used to train Search Agents on queries that require say 4 to 5 searches in order to answer.

The crazy part is that ORBIT was generated entirely without paid Web Search APIs! The entire pipeline runs on a 2018 Linux laptop dirving DeepSeek's free chat interface!

Trained on ORBIT, Qwen3-4B beats InfoSeeker-4B by 4.3 EM and Search-R1-4B by 9.0 EM across 7 Wikipedia QA benchmarks.

A lot of interesting nuggets in this one! As always I hope you find it useful and more than happy to discuss further!

YouTube: https://youtu.be/B71WF6EtgK8

Spotify: https://spotifycreators-web.app.link/e/IAgKLmSsT2b

Agent Orchestration Models

Wordy – Solving SEO Overkill with Information Theory and Stochastic Inference

Build you a personal assistant agent for fun and profit

Show HN: iOS SimulatorCamera – use your MacBook camera with iOS simulators

Achieving CVE Remediation in an Era of Escalating Vulnerabilities

Show HN: Open-Source DesignMD Generator

Offline Local AI for Protest

PageIndex: Vectorless, Reasoning-Based RAG

Don't Outsource Your Understanding

BlueZ-powered Auracast broadcasting on Genio 700

Single-layer transformer model "HarEmb" showcasing PII SOTA performance

Peter Thiel backs $1B ocean data centre startup powered by waves

Add Animal Crossing events to your digital calendar

Why Ancient Egyptian Honey Remains Edible After 3k Years

Show HN: Flow – Workflow automation that follows you across projects

Today I shipped 20 apps and a screensaver

Three Inverse Laws of AI

Agent guardrails are mostly theater

I'm a Doctor. Here's What A.I. Cannot Do

The Value of Reliable Statistics

Show HN: I built a url shortener with cta

Game of Cards – agile for age of agents

OpenAI has reportedly fast-tracked plans for a phone

Node.js 26 Released

SAP to Acquire Dremio to Unify SAP and Non-SAP Data to Power Agentic AI

Warp Earth Catalog

Facet Protocol: open IETF agent-identity with shipped reference implementation

Diamonds Suck (2006)

The Venture-Capital Populist

Vennio – scheduling API for developers and AI agents (MCP-native)