frontpage.

I built FC-Eval to have a repeatable way to evaluate how well different LLMs handle function calling before using them in agent workflows.

It runs models through 30 test cases covering single-turn, multi-turn, and agentic scenarios, modeled loosely after the Berkeley Function Calling Leaderboard methodology.

Validation uses AST matching rather than string comparison to avoid false positives from formatting variations.

Supports two backends: OpenRouter for cloud models (GPT-5.2, Claude, Qwen 3.5, Mistral, etc.) and Ollama for local models with no API key needed.

Tests for best of N trials giving you a reliable score alongside raw accuracy.

Results export to JSON, TXT, CSV, or Markdown.

Quick start commands: Via Openrouter: `fc-eval --provider openrouter --models openai/gpt-5.2 anthropic/claude-sonnet-4.6`

Via Ollama: `fc-eval --provider ollama --models llama3.2`

GitHub repo: https://github.com/gauravvij/function-calling-cli

Happy to answer questions, especially around the test case design or validation logic.

'The Secret Agent': Exploring a Vibrant, yet Violent Brazil

I built HiddenMRR – find revenue opportunities in your old GitHub repos

Love Letter to the Claude Code Docs – Tips from the Docs That Changed How I Work

Introduction to Data-Centric Query Compilation

Powers of finite decimals are finite decimals

The Putney Debates (1647)

Tone Row Operations

Protein complexes added to AlphaFold Database

How the Pokémon franchise has helped to shape neuroscience

Aqara G350 first Matter-certified camera for multi-platform homes

MCP vs. CLI Is the Wrong Fight

Underrated Postgres: Create (Extended) Statistics

Show HN: Introducing Unsloth Studio

Show HN: Cuckoo-GPU – A 350x faster Bloom filter alternative for GPUs

We give every user SQL access to a shared ClickHouse cluster

Forge – OSS governance plugin for Claude Code (22 agents, SDD, quality gates)

Show HN: PUNK – Remote control for local Claude Code that just works

Show HN: Llamactl – Self-hosted LLM manager with OpenAI-compatible routing

OpenAI courts private equity to join enterprise AI venture

Yet Another SQLite-Vector

The Singularity Will Not Be Streamed

Sigwork – A 1.7kb signal-based reactive framework

I love my dumb watches

Show HN: Antfly: Distributed, Multimodal Search and Memory and Graphs in Go

Show HN: I indexed 58K AI agents and built trust scores for the agent economy

I think AI is pushing me toward the AGPL

A.B. 1043's Internet Age Gates Hurt Everyone – Eff.org

Math in the AI Era

Turkish Coffee? Since the 16th Century, It's in the Water

TV Learned to Sell Itself

Show HN: FC-Eval – CLI to Benchmark Local or Cloud LLMs on Function Calling