Those tools map API compatibility. These tests+config add:
1) check which features are available
2) check which parameters you need to use for best results. For example, there are about 6 different options for requesting JSON from OpenRouter, and different models work best with different options.
3) check that the features consistently work. API compatibility and functionality are not the same.
4) Go much deeper: are the models good enough for synthetic data generation? Can they generate uncensored model inputs if you're building a toxicity eval? etc.
Oras•6h ago
- Reasoning.
- Structured Output.
- Logprops
What's the added value from your tests? To verify these features exist?
scosman•6h ago
Those tools map API compatibility. These tests+config add:
1) check which features are available
2) check which parameters you need to use for best results. For example, there are about 6 different options for requesting JSON from OpenRouter, and different models work best with different options.
3) check that the features consistently work. API compatibility and functionality are not the same.
4) Go much deeper: are the models good enough for synthetic data generation? Can they generate uncensored model inputs if you're building a toxicity eval? etc.