frontpage.

It seems that Qwen3 is not capable of driving independent reasoning - it lacks the quality needed to power fully autonomous AI agents.

Initially I was quite impressed with it's problem solving capabilities, when outputting the code through the chat interface. It addressed certain problems much better than Claude or Gemini. However, as soon as I switched to Alibaba Cloud's API to provide Dashscope based implementation of cognizer interface of my new generation of AI agents (chain of code), the whole charm was gone.

Qwen3 struggles with structured generation attempts, quite often falling into an infinite loop when spitting out tokens.

It has troubles crossing boundaries of languages, which is crucial for my agents which are "thinking in code" - writing Kotlin script, containing JavaScript, containing SQL, etc., therefore it will not work well as automated software engineer.

It is "stubborn" - even when the syntax error in generated code is clearly indicated, it is rather wiling to output the same error code again and again, instead of testing another hypothesis.

It lacks the theory of mind and understanding of the context and the environment. For example when asked to check the recent news, it is always responding by trying to use BBC API url, with non-filled API key as a part of the request, while passing this url to the Files tool instead of the WebBrowser tool, which obviously fails.

And the last, but not least - censorship, for example Qwen3 will refuse to search for the information on the most recent anti-governmental protests in China. I wouldn't be surprised if these censorship blockers were partially responsible for poor quality of cognition in other areas.

Maybe I'm doing something wrong, and you are getting much better results with this model for fully autonomous agents with feedback loop?

LicGen – Offline License Generator (CLI and Web UI)

Service Degradation in West US Region

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

Show HN: PalettePoint – AI color palette generator from text or images

Robust and Interactable World Models in Computer Vision [video]

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist

Notes for February 2-7

Study confirms experience beats youthful enthusiasm

The Big Hunger by Walter J Miller, Jr. (1952)

The Genus Amanita

We have broken SHA-1 in practice

Ask HN: Was my first management job bad, or is this what management is like?

Ask HN: How to Reduce Time Spent Crimping?

KV Cache Transform Coding for Compact Storage in LLM Inference

A quantitative, multimodal wearable bioelectronic device for stress assessment

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

How to shoot yourself in the foot – 2026 edition

Eight More Months of Agents

From Human Thought to Machine Coordination

The new X API pricing must be a joke

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)