Hey HN! I'm the creator of AITools.coffee. This is a metrics observatory for the open-source AI ecosystem – think "GitHub Archive meets awesome-AI, but with daily time-series tracking."
What makes this different from awesome-lists?
Awesome-lists are static Markdown files. They're great for discovery, but they:
Require manual PRs to update
Show current state only (no historical trends)
Don't track metrics (stars, forks, contributors, etc.)
Go stale quickly
AITools is a live database that:
Syncs 27,769 repositories daily via GitHub GraphQL API
Tracks 16 metrics per repo (stars, forks, issues, PRs, releases, commits, contributors, etc.)
Stores daily snapshots for time-series analysis (430M+ datapoints collected so far)
Auto-removes dead/archived repos, auto-heals renamed repos with 301 redirects
Technical Architecture
Backend:
PostgreSQL 18 (27K repos, 21K authors, 430K metric snapshots)
PHP 8.3 REST API with JWT auth
Nightly cron (00:01 UTC) running GitHub GraphQL sync (~25 min for full sync)
Discovery Pipeline:
Python scripts sweep 50+ AI organizations (OpenAI, Meta, Google, Anthropic, Hugging Face, etc.)
GitHub Search API monitors 30+ topics (machine-learning, LLM, transformers, etc.)
Gemini 2.5 Flash classifies repos into 30+ categories
100% manual review before publish (3-layer quality filter)
Frontend:
Tailwind CSS with glassmorphism design
Alpine.js for interactivity
Chart.js + D3.js for metrics visualization (star distribution, language breakdown, contributor growth)
Data Freshness:
Last sync: typically <6 hours ago
440K+ datapoints added daily (27K repos × 16 metrics)
Rate limit: 1 GraphQL query/sec (stays under GitHub's 5K pts/hr)
What I'm tracking
Per repository (16 datapoints):
stars, forks, watchers, open issues, open PRs, releases, commits (last 100), contributors, size, archived status, default branch, pushed_at, created_at, license, language, topics
Per author (8 datapoints):
followers, following, public repos, gists, bio, company, location, created_at
All stored as daily snapshots → enables time-series analysis (star velocity, contributor growth, issue trends).
Current Scale
27,769 AI repositories tracked
20,992 open-source authors
12.4M+ total GitHub stars (aggregated)
430K+ metric snapshots collected
440K datapoints added per day
Limitations & Future Plans
What's NOT implemented yet:
Public API (planned Q2 2026, always free with rate limits)
Historical charts (star growth over time) – data is there, visualization coming soon
Trending repos (7-day star velocity ranking) – planned next month
Email alerts for repo milestones – maybe later
Open Source?
Not yet. Considering open-sourcing the discovery pipeline + classification logic, but the full platform will likely remain closed-source (hosting costs, spam prevention, API abuse).
Why I built this
I got frustrated manually tracking AI repos across GitHub, Twitter, and Discord. There's no single place to:
Compare similar tools by actual metrics (not just star count)
See which projects are actively maintained vs abandoned
Track contributor velocity (is the project growing or stagnating?)
Filter by license, language, framework, use case
Awesome-lists are great for curated discovery, but terrible for data-driven analysis. I wanted both.
Questions I'm expecting
Q: How do you handle spam/SEO farms?
A: 3-layer filter: (1) Gemini AI relevance check, (2) Manual review (100% of submissions), (3) Automated quality signals (min 10 stars, active within 2 years, not archived).
Q: What about non-GitHub repos (GitLab, Bitbucket)?
A: Not supported yet. 99% of open-source AI is on GitHub, so I focused there. May expand later if there's demand.
Q: Can I submit my own project?
A: Yes! Use the "Submit Tool" form (requires GitHub login to prevent spam). Your repo will be queued for review. Alternatively, if you're in one of the 50 orgs I monitor, your repo will be discovered automatically within a week.
Q: How accurate is Gemini classification?
A: ~85% accurate on initial categorization. I manually review and re-categorize misclassifications. Common mistakes: RAG frameworks → agent frameworks, base models → fine-tuned models.
Q: Will you add X feature?
A: Probably! Top requests: historical star charts, trending page, email alerts, public API. Working through them in order of complexity vs impact.
Q: What's your business model?
A: None yet. This is a side project that costs ~$30/month (SiteGround hosting + Gemini API). If it grows beyond hobby scale, I might add sponsored listings or premium API tiers, but the core data will stay free.
Feedback welcome! Especially:
Missing repos/categories you'd like to see tracked
UI/UX improvements (the homepage is dense with data, might be overwhelming)
Technical architecture critiques (I'm sure there are better ways to do this)
Feature requests (what metrics would actually be useful?)
Tech stack: PostgreSQL, PHP, Python, Gemini 2.5 Flash, GitHub GraphQL API, Chart.js, D3.js, TailwindCSS, Alpine.js
alexela84•1h ago
What makes this different from awesome-lists? Awesome-lists are static Markdown files. They're great for discovery, but they:
Require manual PRs to update Show current state only (no historical trends) Don't track metrics (stars, forks, contributors, etc.) Go stale quickly AITools is a live database that:
Syncs 27,769 repositories daily via GitHub GraphQL API Tracks 16 metrics per repo (stars, forks, issues, PRs, releases, commits, contributors, etc.) Stores daily snapshots for time-series analysis (430M+ datapoints collected so far) Auto-removes dead/archived repos, auto-heals renamed repos with 301 redirects Technical Architecture Backend:
PostgreSQL 18 (27K repos, 21K authors, 430K metric snapshots) PHP 8.3 REST API with JWT auth Nightly cron (00:01 UTC) running GitHub GraphQL sync (~25 min for full sync) Discovery Pipeline:
Python scripts sweep 50+ AI organizations (OpenAI, Meta, Google, Anthropic, Hugging Face, etc.) GitHub Search API monitors 30+ topics (machine-learning, LLM, transformers, etc.) Gemini 2.5 Flash classifies repos into 30+ categories 100% manual review before publish (3-layer quality filter) Frontend:
Tailwind CSS with glassmorphism design Alpine.js for interactivity Chart.js + D3.js for metrics visualization (star distribution, language breakdown, contributor growth) Data Freshness:
Last sync: typically <6 hours ago 440K+ datapoints added daily (27K repos × 16 metrics) Rate limit: 1 GraphQL query/sec (stays under GitHub's 5K pts/hr) What I'm tracking Per repository (16 datapoints): stars, forks, watchers, open issues, open PRs, releases, commits (last 100), contributors, size, archived status, default branch, pushed_at, created_at, license, language, topics
Per author (8 datapoints): followers, following, public repos, gists, bio, company, location, created_at
All stored as daily snapshots → enables time-series analysis (star velocity, contributor growth, issue trends).
Current Scale 27,769 AI repositories tracked 20,992 open-source authors 12.4M+ total GitHub stars (aggregated) 430K+ metric snapshots collected 440K datapoints added per day Limitations & Future Plans What's NOT implemented yet:
Public API (planned Q2 2026, always free with rate limits) Historical charts (star growth over time) – data is there, visualization coming soon Trending repos (7-day star velocity ranking) – planned next month Email alerts for repo milestones – maybe later Open Source? Not yet. Considering open-sourcing the discovery pipeline + classification logic, but the full platform will likely remain closed-source (hosting costs, spam prevention, API abuse).
Why I built this I got frustrated manually tracking AI repos across GitHub, Twitter, and Discord. There's no single place to:
Compare similar tools by actual metrics (not just star count) See which projects are actively maintained vs abandoned Track contributor velocity (is the project growing or stagnating?) Filter by license, language, framework, use case Awesome-lists are great for curated discovery, but terrible for data-driven analysis. I wanted both.
Questions I'm expecting Q: How do you handle spam/SEO farms? A: 3-layer filter: (1) Gemini AI relevance check, (2) Manual review (100% of submissions), (3) Automated quality signals (min 10 stars, active within 2 years, not archived).
Q: What about non-GitHub repos (GitLab, Bitbucket)? A: Not supported yet. 99% of open-source AI is on GitHub, so I focused there. May expand later if there's demand.
Q: Can I submit my own project? A: Yes! Use the "Submit Tool" form (requires GitHub login to prevent spam). Your repo will be queued for review. Alternatively, if you're in one of the 50 orgs I monitor, your repo will be discovered automatically within a week.
Q: How accurate is Gemini classification? A: ~85% accurate on initial categorization. I manually review and re-categorize misclassifications. Common mistakes: RAG frameworks → agent frameworks, base models → fine-tuned models.
Q: Will you add X feature? A: Probably! Top requests: historical star charts, trending page, email alerts, public API. Working through them in order of complexity vs impact.
Q: What's your business model? A: None yet. This is a side project that costs ~$30/month (SiteGround hosting + Gemini API). If it grows beyond hobby scale, I might add sponsored listings or premium API tiers, but the core data will stay free.
Feedback welcome! Especially:
Missing repos/categories you'd like to see tracked UI/UX improvements (the homepage is dense with data, might be overwhelming) Technical architecture critiques (I'm sure there are better ways to do this) Feature requests (what metrics would actually be useful?) Tech stack: PostgreSQL, PHP, Python, Gemini 2.5 Flash, GitHub GraphQL API, Chart.js, D3.js, TailwindCSS, Alpine.js
Live at: https://aitools.coffee