AITools.coffee – GitHub metrics observatory tracking 27K+ open-source AI repos

1•alexela84•1h ago

Comments

alexela84•1h ago

Hey HN! I'm the creator of AITools.coffee. This is a metrics observatory for the open-source AI ecosystem – think "GitHub Archive meets awesome-AI, but with daily time-series tracking."

What makes this different from awesome-lists? Awesome-lists are static Markdown files. They're great for discovery, but they:

Require manual PRs to update Show current state only (no historical trends) Don't track metrics (stars, forks, contributors, etc.) Go stale quickly AITools is a live database that:

Syncs 27,769 repositories daily via GitHub GraphQL API Tracks 16 metrics per repo (stars, forks, issues, PRs, releases, commits, contributors, etc.) Stores daily snapshots for time-series analysis (430M+ datapoints collected so far) Auto-removes dead/archived repos, auto-heals renamed repos with 301 redirects Technical Architecture Backend:

PostgreSQL 18 (27K repos, 21K authors, 430K metric snapshots) PHP 8.3 REST API with JWT auth Nightly cron (00:01 UTC) running GitHub GraphQL sync (~25 min for full sync) Discovery Pipeline:

Python scripts sweep 50+ AI organizations (OpenAI, Meta, Google, Anthropic, Hugging Face, etc.) GitHub Search API monitors 30+ topics (machine-learning, LLM, transformers, etc.) Gemini 2.5 Flash classifies repos into 30+ categories 100% manual review before publish (3-layer quality filter) Frontend:

Tailwind CSS with glassmorphism design Alpine.js for interactivity Chart.js + D3.js for metrics visualization (star distribution, language breakdown, contributor growth) Data Freshness:

Last sync: typically <6 hours ago 440K+ datapoints added daily (27K repos × 16 metrics) Rate limit: 1 GraphQL query/sec (stays under GitHub's 5K pts/hr) What I'm tracking Per repository (16 datapoints): stars, forks, watchers, open issues, open PRs, releases, commits (last 100), contributors, size, archived status, default branch, pushed_at, created_at, license, language, topics

Per author (8 datapoints): followers, following, public repos, gists, bio, company, location, created_at

All stored as daily snapshots → enables time-series analysis (star velocity, contributor growth, issue trends).

Current Scale 27,769 AI repositories tracked 20,992 open-source authors 12.4M+ total GitHub stars (aggregated) 430K+ metric snapshots collected 440K datapoints added per day Limitations & Future Plans What's NOT implemented yet:

Public API (planned Q2 2026, always free with rate limits) Historical charts (star growth over time) – data is there, visualization coming soon Trending repos (7-day star velocity ranking) – planned next month Email alerts for repo milestones – maybe later Open Source? Not yet. Considering open-sourcing the discovery pipeline + classification logic, but the full platform will likely remain closed-source (hosting costs, spam prevention, API abuse).

Why I built this I got frustrated manually tracking AI repos across GitHub, Twitter, and Discord. There's no single place to:

Compare similar tools by actual metrics (not just star count) See which projects are actively maintained vs abandoned Track contributor velocity (is the project growing or stagnating?) Filter by license, language, framework, use case Awesome-lists are great for curated discovery, but terrible for data-driven analysis. I wanted both.

Questions I'm expecting Q: How do you handle spam/SEO farms? A: 3-layer filter: (1) Gemini AI relevance check, (2) Manual review (100% of submissions), (3) Automated quality signals (min 10 stars, active within 2 years, not archived).

Q: What about non-GitHub repos (GitLab, Bitbucket)? A: Not supported yet. 99% of open-source AI is on GitHub, so I focused there. May expand later if there's demand.

Q: Can I submit my own project? A: Yes! Use the "Submit Tool" form (requires GitHub login to prevent spam). Your repo will be queued for review. Alternatively, if you're in one of the 50 orgs I monitor, your repo will be discovered automatically within a week.

Q: How accurate is Gemini classification? A: ~85% accurate on initial categorization. I manually review and re-categorize misclassifications. Common mistakes: RAG frameworks → agent frameworks, base models → fine-tuned models.

Q: Will you add X feature? A: Probably! Top requests: historical star charts, trending page, email alerts, public API. Working through them in order of complexity vs impact.

Q: What's your business model? A: None yet. This is a side project that costs ~$30/month (SiteGround hosting + Gemini API). If it grows beyond hobby scale, I might add sponsored listings or premium API tiers, but the core data will stay free.

Feedback welcome! Especially:

Missing repos/categories you'd like to see tracked UI/UX improvements (the homepage is dense with data, might be overwhelming) Technical architecture critiques (I'm sure there are better ways to do this) Feature requests (what metrics would actually be useful?) Tech stack: PostgreSQL, PHP, Python, Gemini 2.5 Flash, GitHub GraphQL API, Chart.js, D3.js, TailwindCSS, Alpine.js

Live at: https://aitools.coffee

Should your developer company go open source?

Awesome Vibe Coding – 245 AI coding tools and resources

Accelerating Mathematical and Scientific Discovery with Gemini Deep Think

Rebuilding the IBM 701C Butterfly Keyboard Laptop.io

Cancer might protect against Alzheimer's – this protein helps explain why

Show HN: MOL – A programming language where pipelines trace themselves

A Visual Source for Shakespeare's 'Tempest'

Python Steering Council pleased to accept PEP 814 – Add frozendict built-in type

We interfaced single-threaded C++ with multi-threaded Rust

Composer 1.5

Camera that captures photos to cassette tape

Adblock Filter List Fingerprinting

Quantum computing is near. Maryland wants to lead the way

European Commission breached – investigating mobile hack

Show HN: Wallfacer – Persistent development environments for AI coding agents

See no Evil(ginx) / Detecting and stopping AitM phishing threats

"Discord alternatives" searches jump 10k% overnight

Client-side EXIF removal instead of uploading photos

Just Ring The Bell – A 10-second pause when you feel a craving

Point of no return: a hellish 'hothouse Earth' getting closer, scientists say

Ask HN: Got Sidetracked, How to Cope?

Show HN: Brighten – Employee recognition platform with peer-to-peer rewards

Sac of Enshittification

The Perfect Device

FWD: Re: radioactive fungus email from grandma (2024)

The pitch deck is dead. Write a pitch.md instead

Show HN: AI People Search Engine for SF

The Battle for Prince's Estate (2024)

Show HN: Veronium - A tool to gamify life and beat procrastination

Claude Cowork Has No SOC2, No Audit Logs, No MultiUser. It Wiped $285B from SaaS