AI Web Search and Scraping

https://github.com/larryste1/web-search-tool

2•larryste•1h ago

Comments

larryste•1h ago

# Show HN: web-search-tool – Search/scrape web with AI-friendly output

*Project:* https://github.com/larryste1/web-search-tool *PyPI:* https://pypi.org/project/web-search-tool/

## The Problem

Building AI assistants needs: reliable search with fallback, clean content extraction, API flexibility, and structured JSON output. Existing solutions are single-backend (breaks when API fails), too complex, or output raw HTML.

## The Solution

`web-search-tool` searches/scrapes with clean, AI-friendly output:

```bash pip install web-search-tool web-search "Python async best practices" # Search with AI answer web-search "React hooks tutorial" --scrape # Full article content web-search "machine learning" --include-domain arxiv.org # Filter domain web-search "API design" --json # JSON output ```

## Features

- *3 Backends with Auto-Fallback*: Tavily → Serper → DuckDuckGo - *Content Scraping*: Extract main article text via BeautifulSoup - *Domain Filtering*: Include/exclude specific domains - *Search Depth*: Basic or advanced - *AI-Friendly Output*: Structured results with optional AI answers - *JSON Output*: Pipe to jq or parse in scripts

## How It Works

``` Query → Tavily (AI, needs key) → Serper (Google, needs key) → DuckDuckGo (free) ```

## Examples

```bash # AI Research with Answer $ web-search "What is Rust ownership?"

Search: What is Rust ownership? Backend: Tavily Answer: Rust ownership manages memory allocation. Each value has one owner...

# Scrape Full Articles $ web-search "Python decorators" --scrape --num 3

# Domain-Specific $ web-search "type hints" --include-domain realpython.com --include-domain docs.python.org

# Programmatic Use from web_search_tool import search_web result = search_web("Python best practices", scrape_urls=True) ```

## API Keys

| Backend | Key | Get Key | |---------|-----|---------| | Tavily | Optional | https://tavily.com/ | | Serper | Optional | https://serper.dev/ | | DuckDuckGo | None | Free |

```bash export TAVILY_API_KEY=your-key-here export SERPER_API_KEY=your-key-here ```

Without keys, falls back to DuckDuckGo automatically.

## Why I Built This

Building AI assistants, I hit: single point of failure, messy output, no fallback. This tool tries multiple backends, extracts clean text, returns structured JSON, works without API keys.

## Tech Stack

Requests, BeautifulSoup4, Tavily API, Serper API, DuckDuckGo HTML

## Try It

```bash pip install web-search-tool web-search "Python tutorials" # No API key needed ```

*GitHub:* https://github.com/larryste1/web-search-tool

*Feedback:* What backends should I add? How do you handle web search in AI projects?

--- Built after too many API failures with single-backend tools.

Show HN: Giggles – A batteries-included React framework for TUIs

Curl documentation bans the word 'very'

Building an Open-Source Verilog Simulator with AI: 580K Lines in 43 Days

Iran's Cryptic Shortwave Messages [video]

Entry-level PC market to 'disappear' by 2028 – memory prices strain PC market

How to Recover Your Stolen Crypto After a Scam–Guidance from Intelligence Wizard

Show HN: Autonoma – Python secret fixer that refuses unsafe fixes

The Excommunicated Devs Making Games with AI

Ask HN: What Online LLM / Chat do you use?

CKAN – an open-source DMS (data management system)

My (Hypothetical) SRECon26 Keynote

Prompt Vault – Save and organize your AI prompts ($9 Pro)

Show HN: An Auditable Decision Engine for AI Systems

How to Recover Your Stolen Crypto After a Scam–Guidance from Intelligence Wizard

Do AI Agents Make Money in 2026? Or Is It Just Mac Minis and Vibes?

Underground Salt Caverns Are Preserving Our History

One-Stop Wan AI Video and Image Generator Platform

Show HN: Ask Mob

Show HN: A Kotlin Multiplatform app that works on watch, CLI, browser extension

NY bill would prohibit AI chatbots from giving legal advice

Show HN: Generate random, valid US residential addresses for testing

Unbound Video AI is the most unrestricted AI video tool I've tried in 2026

A timeline of cyber attacks:home users, contractors, and SMBs are now targets

Iran unleashes Shahed drones aimed at targets across Middle East

Shutting down, open sourced private AI document server

Zuckerberg's internal emails rendered as Facebook Messenger

Daily LNG freight rates jump over 40% amid Mideast strikes

Solar Time vs. Standard Time heat map chart

Show HN: One-click ComfyUI setup for RTX 50-series on Windows (cu130, no Docker)

Ask HN: Codex CLI error reveals "GPT-5.4-ab-arm2" string

AI Web Search and Scraping

Comments

Show HN: Giggles – A batteries-included React framework for TUIs

Curl documentation bans the word 'very'

Building an Open-Source Verilog Simulator with AI: 580K Lines in 43 Days

Iran's Cryptic Shortwave Messages [video]

Entry-level PC market to 'disappear' by 2028 – memory prices strain PC market

How to Recover Your Stolen Crypto After a Scam–Guidance from Intelligence Wizard

Show HN: Autonoma – Python secret fixer that refuses unsafe fixes

The Excommunicated Devs Making Games with AI

Ask HN: What Online LLM / Chat do you use?

CKAN – an open-source DMS (data management system)

My (Hypothetical) SRECon26 Keynote

Prompt Vault – Save and organize your AI prompts ($9 Pro)

Show HN: An Auditable Decision Engine for AI Systems

How to Recover Your Stolen Crypto After a Scam–Guidance from Intelligence Wizard

Do AI Agents Make Money in 2026? Or Is It Just Mac Minis and Vibes?

Underground Salt Caverns Are Preserving Our History

One-Stop Wan AI Video and Image Generator Platform

Show HN: Ask Mob

Show HN: A Kotlin Multiplatform app that works on watch, CLI, browser extension

NY bill would prohibit AI chatbots from giving legal advice

Show HN: Generate random, valid US residential addresses for testing

Unbound Video AI is the most unrestricted AI video tool I've tried in 2026

A timeline of cyber attacks:home users, contractors, and SMBs are now targets

Iran unleashes Shahed drones aimed at targets across Middle East

Shutting down, open sourced private AI document server

Zuckerberg's internal emails rendered as Facebook Messenger

Daily LNG freight rates jump over 40% amid Mideast strikes

Solar Time vs. Standard Time heat map chart

Show HN: One-click ComfyUI setup for RTX 50-series on Windows (cu130, no Docker)

Ask HN: Codex CLI error reveals "GPT-5.4-ab-arm2" string