Tokens are expensive, companies want to make sure they are being used effectively. Fair enough. The problem is that Weave provides nearly zero insight into how their metrics work. They do tell you they use AI to do their analysis, so likely in practive it is LLMs analyzing the usage of other LLMs.
If your employer sets this up your work with AI is reduced down to a number. From first hand experience these numbers are used to make firing decisions. The number is, laughably, presented as "Code output" xx.x / week. It doesn't tell you anything about what the unit is or how they arrive at it.
They provide no assurances or proof that their model isn't biased. They claim their dataset is from expert level PRs. They don't say what the dataset is or how it was compiled exactly. Does it cover native and non-native English speakers? Does it cover young developers' work as well as older developers' work? Did it learn any other biases from usernames, git commit email addresses, etc?
Who knows!? But guess you're fired anyway because computer number go down.
The strawberry on top is that in their [blog posts][1] they claim both 94% accuracy and a 0.94 correlation, but are referring to the same thing. They do not even know the statistical difference between accuracy and correlation but their model gets to decide how good of a vibe coder you are.
Weave's investors are:
Moonfire — lead investor (European early-stage VC) Burst Capital — co-lead (early-stage VC) Y Combinator — participant (the accelerator program; 25% of their current batch uses Weave per Weave's own blog) Pioneer Fund — participant (per PitchBook) Roar Ventures — participant (per PitchBook)
[0]: https://workweave.dev/ [1]: https://web.archive.org/web/20260212040803/https://workweave.dev/blog/weave-vs-linearb