Laguna XS.2 and M.1

https://poolside.ai/blog/laguna-a-deeper-dive

40•tosh•1h ago

Comments

rohitpaulk•1h ago

Been testing these via their "pool" agent. It's fast, and the agent adheres to the ACP spec pretty well (better than codex, opencode etc.) so it's a good experience in Zed.

throwaw12•1h ago

Has anyone tried these models?

I like their honesty in benchmarks, looks like Qwen3.6 35B is outperforming their Laguna M.1 225B model

kingjimmy•56m ago

the color-codes make those benchmarks charts impossible to understand. very pretty though.

data-ottawa•52m ago

For what it's worth, the bars correspond in order with the legend. Plus there’s hover text.

franksiem•53m ago

Felt like they would never come out of stealth mode but very nice to see it materialized into something competitive.

refulgentis•18m ago

What makes them distinctive?

throwaw12•18m ago

Not sure if this is competitive, look at the numbers for Qwen3.6

jaen•43m ago

For similarly sized models, not looking very good on the slightly-less-benchmaxxed Terminal-Bench 2.0:

  Laguna XS.2  33B-A3B params: 30.6
  Qwen 3.6     35B-A3B       : 51.5
  Devstral 2   123B          : 31.2

Quite a huge lead for Qwen... well, at least it's catching up to other smaller Western labs.

megavon•33m ago

Need to look at SWEBench-Pro, it's super competitive. Suspect they'll catch up given the longer-tail on TB scores.

jaen•17m ago

Just by the (lack of) inter-model variance, I don't think SWEBench-Pro does a very good job of representing model capability. Terminal-Bench seems more challenging and separates the wheat from the chaff.

Also, *ops work, which in my experience can actually be more complicated than SWE is underrepresented there obviously.

speedgoose•42m ago

Please update the charts. Consider using textures or filling patterns.

I usually score pretty well in colour perception tests but distinguishing between those two purples made me doubt myself.

matthewfcarlson•21m ago

My phone is in grayscale to make it less interesting (I still watch way too many videos in grayscale but it helps) so I’m right with you

esafak•11m ago

They're not winning any popular benchmark so is there some niche this excels?

How to Keep Your Brain Sharp: A Practical Playbook Beyond the Basics

Claude Design Is 404ing

Phaser: Create 2D games for the web – free, open source, and AI-ready

Wild GPT-image-2 use cases

Amtaitfy – Let Me Google That for You, but the AI Is Wrong on Purpose

Nvidia Nemotron 3 Nano Omni

Height hunt: a quest to find and visit every possible low bridge / height restri

Shots Fired by Google Cloud CEO Thomas Kurian

Woman's Talkspace therapy app sessions exposed in court

The Guard Act Isn't Targeting Dangerous AI–It's Blocking Everyday Internet Use

GPT-Engineer: Precursor to Lovable.dev

Ask HN: Site that tracks AI subscription token amount?

Show HN: Inter-session messaging between Claude Code sessions

OpenAI Models on Amazon Bedrock

Distilling a Tiny Model for Fast Interpretability

Apple Weather App Down

Bounce Update: PDS Provider Migrations

Google DeepMind Paper Argues LLMs Will Never Be Conscious

Why So Many Mayors Are Quitting

BookStack Moves from GitHub to Codeberg

Ryzen Saved AMD from Bankruptcy – 10 Years of CPUs Tested [video]

How Semiconductors Were Made in America

Once I Understood Where AI Is Heading, I Stopped Being Anxious About It

Buying, Selling on eBay Disrupted Worldwide for more than 24 hours

Universal Transformers Need Memory: Depth-State Trade-Offs in Adaptive Recursive

Show HN: Art Coding Lab – Learn Creative Coding Through Micro Challenges

GraphCompose – declarative PDF layout engine for Java (MIT)

Show HN: I built a dating SIM that prepares you for your date

Study Finds a Third of New Websites Are AI-Generated

GB Electricity Bills