Ask HN: Perplexity AI Cheating on Model?

1•freakynit•5mo ago

See screenshot: https://postimg.cc/YL0d6rvP

It seems to be using its own model, most likely a very small and inexpensive one.

I became suspicious because it was responding very poorly, even to simple PostgreSQL/Node.js queries, which is not typical of GPT-5. I have plenty of experience using GPT-5 and know the quality of answers it usually provides.

Has anyone else experienced the same issue?

Comments

freakynit•5mo ago

Following up: I asked same model name question directly to gpt-5 using ChatGPT, and it responsed properly:

" I’m GPT-5, the latest generation model from OpenAI. "

begemotz•5mo ago

Anecdotally I have found that the quality of perplexity.ai has gone down significantly over the past year. It seems as though (at least for the free version) - it is nothing more than natural language web-search now.

BoredPositron•5mo ago

Asking an LLM about itself is not an reliable way to find out with which model you are interacting with. If you look at the GPT5 system prompt you'll see that the model knows which model it is because it's written in the system prompt. If you use the API as Perplexity is doing you can write whatever you want in the system prompt. If I write you are "Deep Thought an LLM developed by Douglas Adams" in my system prompt it will "think" it's Deep Thought.

freakynit•5mo ago

That's true. But then all models exposed by Perplexity AI should be affected, not just gpt-5. This is not the case.

---

I switched to Sonnet and asked same question, and it responsed with::: "I'm Claude, an AI assistant created by Anthropic"

---

for grok-4, it was something even worse. Here is what was in the "thinking tokens"::: "I'm identifying myself as an AI language model by OpenAI, specifically based on GPT-4 architecture." ... but, the final answer produced this::: "I’m Perplexity’s AI assistant. I use a combination of advanced large-language-model back-ends (including GPT-4o and Claude 4 Sonnet) and Perplexity’s own retrieval system to answer your questions in real time."

---

For O3, here was it's response::: "I’m a large-language AI assistant built by Perplexity, powered by state-of-the-art models such as GPT-4o."

BoredPositron•5mo ago

As you see every model answers differently and you are still trusting their outputs which is the main problem. You can't determine which model is used by asking the model itself.

freakynit•5mo ago

It's not just the model name part. The whole reason I even asked such a question was because the outputs it was generating were absolute garbage. Like something that will be generated by a super super small 3B model. even after giving it direct hints on why it was wrong, it kept uttering nonsense. I repeated the same in chatgpt and with sonnet models... they both gave sensible, expected outputs.

This is not about model name question, this is about actual severe quality deterioration. Till yesterday, this was all fine.

I believe they are ding A/B testing. More models will start to show similar behaviours. And this will be specific to India. I might have guessed the reason too:

1. https://indianexpress.com/article/technology/techook/airtel-...

2. https://www.financialexpress.com/life/technology-free-perple...

3. https://www.news18.com/tech/airtels-free-ai-offer-with-perpl...

With 300+ million Airtel users in India, I believe they are trying to reduce costs since customers have already been acquired.

BoredPositron•5mo ago

You are still making assumptions without empirical data.

freakynit•5mo ago

Assumptions on the reasoning part: yes.

On the quality part: No.

Same Surface, Different Weight

The Rise of Spec Driven Development

The first good Raspberry Pi Laptop

Seas to Rise Around the World – But Not in Greenland

Will Future Generations Think We're Gross?

State Department will delete Xitter posts from before Trump returned to office

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?

Hacking up your own shell completion (2020)

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

GLM-OCR: Accurate × Fast × Comprehensive

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

Show HN: AboutMyProject – A public log for developer proof-of-work

Expertise, AI and Work of Future [video]

So Long to Cheap Books You Could Fit in Your Pocket

PID Controller

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

Kubernetes MCP Server

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Same Surface, Different Weight

The Rise of Spec Driven Development

The first good Raspberry Pi Laptop

Seas to Rise Around the World – But Not in Greenland

Will Future Generations Think We're Gross?

State Department will delete Xitter posts from before Trump returned to office

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?

Hacking up your own shell completion (2020)

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

GLM-OCR: Accurate × Fast × Comprehensive

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

Show HN: AboutMyProject – A public log for developer proof-of-work

Expertise, AI and Work of Future [video]

So Long to Cheap Books You Could Fit in Your Pocket

PID Controller

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

Kubernetes MCP Server

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Ask HN: Perplexity AI Cheating on Model?

Comments