The scoring checks every word you write against the model's logprobs. Right now I'm using Llama3.1, Deepseek v3 and Qwen3 to keep costs low. I tried to calibrate it so other models (chatgpt/claude) score 100% and interesting human responses score in the 10-30% range.
Totally free, no signup
niklio•48m ago
And yes, gibberish responses score very human :)