I built a scoring system to measure how AI models represent software products when users ask buying questions.
The process: I take a product, generate the queries a buyer would ask (category, competitor alternatives, head-to-head), run them through ChatGPT, Claude, Perplexity, and Gemini, then score how prominently the product appears in each response (0-10).
Some findings from scanning 35 products:
ChatGPT is the biggest blind spot. It scores 0 for most open source challengers, even ones with 30K+ GitHub stars.
Incumbents dominate across all models. "Best project management tool" returns Jira, Linear, Asana — never Plane (31K stars on GitHub).
Brand-name queries work. "Cal.com vs Calendly" scores 9+ everywhere. But generic category queries ("best scheduling tool") often return 0.
Having revenue doesn't help. Trigger.dev raised $16M and scores 0/10 for background job queries.
The scoring methodology: each model gets 0-10 per query based on mention position, detail, and recommendation strength. Product consensus = average across all models and queries.
Code and methodology are open. Happy to scan any product if you drop it in the comments.