Ask any LLM to "consider multiple perspectives" and you get hedged consensus. The model acknowledges trade-offs exist, then settles on a moderate position that offends nobody. Useful for summaries. Useless for decision making.
Perspectives forces disagreement. 8 personas with fundamentally incompatible frameworks debate your question through a structured protocol, then vote using Single Transferable Vote to surface where they actually land. The output is a PDF report synthesising all of it.
How it works
Blind Proposals: Each persona generates a position without seeing the others. This prevents the "anchoring problem" where early responses shape later ones, bypassing the default sycophancy of LLMs.
Interrogation of Blind Proposals: Proposals face structured challenges from 3 opposing personas. A "high-empathy" persona (e.g., The Idealist) will be challenged by a "low-empathy" cluster (e.g., The Pragmatist). This reveals exactly where arguments buckle under pressure.
Discussion & Voting: Personas can debate (optional) before ranking preferences via STV. This highlights first-choice winners and preference flows rather than simple majority rule.
Analysis/Prediction Report: The final PDF structures recommendations first, followed by supporting analysis (factual background, risk assessment, evidence quality).
Two Operational Modes
Analysis Mode ("What should we do?"): Evaluates options and surfaces trade-offs. Output is qualitative judgment.
Prediction Mode ("What will happen?"): Generates probability estimates with resolution criteria.
Feedback Loops
Most AI agent projects have no way to measure whether their outputs are actually good. Users provide subjective feedback, which is noisy and unreliable. The system optimises for seeming useful rather than being useful.
Prediction Mode creates an objective feedback loop. When a prediction resolves, I can measure accuracy.
I'm integrating Polymarket as the verification source. Run a question through Perspectives, record the predictions, compare against actual outcomes when they resolve. Over time, this builds calibration data showing which methodologies perform best for different question types.
Persona Sets
Different decisions need different analytical lenses. Four built-in sets:
Philosophical (Default): Best for ethical dilemmas and strategic decisions.
Business-Focused: Best for commercial decisions.
Product-Focused: Best for product development.
Forecaster: Optimised for Prediction Mode.
Technical Details
LLM Support: Supports any OpenAI/Anthropic compatible API (Claude, OpenRouter, Ollama, Grok, etc.).
Web Search: Optional integration for grounding debates in recent events.
Output: Single PDF report per query.
What I'm Looking For
I've been building this solo and could use external feedback on a few things:
1. Does the blind proposal mechanism actually produce better disagreement?
2. Is the interrogation protocol overkill or useful? The structured challenge/response/verdict cycle generates rich data, but adds latency (dependant on concurrency settings).
3. What decisions would you run through this?
4. Do you use ChatGPT or similar systems to make decisions?
5. Do you find "chain of thought" output useful for tracking reasoning?
Links
Perspectives: https://getperspectives.app
Dev blog: https://blog.jmatthews.uk
Example Analysis Report (Is it viable to run a nation where all laws expire after 10 years and must be re-passed?): https://drive.google.com/file/d/1hsJOWsQDAtVOqOKF6_a_Q1jYOlB...
Example Prediction Report (Will Kraken IPO by 31st March 2026?): https://drive.google.com/file/d/1m3RedFtv8lKgFqf1_rvzl8W6cTs...
Happy to answer any questions in this thread.