Our compare_a_9 spent $0.50 per task on GPQA and outperformed GPT-5.4, Claude Opus 4.1, Grok 4 (0309 v2)... ... . Optimizing the base model DeepSeek-V3.2-Thinking, achieved a 6.5% improvement and reached the top 5 LLMs worldwide.https://hotblaz.comhttps://huggingface.co/Hotblaz/Compare_Anti_9
markliuhotblaz•1h ago