DeepSeek V4 is out. the best open-source on coding. here's the breakdown

3•Alisaqqt•2h ago

Two models: Flash (284B total, 13B active) and Pro (1.6T total, 49B active). both hit 1M token context.

V4-Pro is their flagship. Beats Claude Opus 4.6 Max on Agent coding tasks (their words). specifically calls out being better than Sonnet 4.5 on coding, and competitive with Opus 4.6 on general benchmarks. on world knowledge and STEM, they say it's ahead of Gemini-Pro-3.1.

V4-Flash is the sleeper pick. Faster and cheaper than Pro, but it has better long-context efficiency than Pro does.

Original Text: Agent capabilities massively improved: V4-Pro hits SOTA on Agentic Coding benchmarks among open-source models. In practice, users report it feels better than Sonnet 4.5, and output quality is close to Opus 4.6 non-thinking mode — though there's still a gap vs Opus 4.6 with thinking enabled.

World knowledge: V4-Pro leads all open-source models by a significant margin on knowledge benchmarks, sitting just behind Gemini-Pro-3.1 among closed-source frontier models.

Top-tier reasoning: On math, STEM, and competitive coding, V4-Pro beats every open-source model that's been publicly benchmarked and is trading blows with the best closed-source models in the world. the 1M context is the real headline. Redesigned attention entirely — combines something called DSA (Deeply Sparse Attention) to handle the scale without blowing up compute. V4 inference cost stays flat as tokens scale up vs V3.2 which shoots up. the architecture improvement is what makes this actually usable, not just a spec number.

Agent capabilities got a dedicated upgrade. Trained specifically against Claude Code, OpenClaw, OpenCode, and CodeBuddy. V4-Pro is now the recommended model for any agentic / coding workflow. Flash is explicitly not recommended for the most complex agent tasks.

API is live. Pricing:

DeepSeek-V4-Flash: $0.14 / $0.28 per M input/output tokens

DeepSeek-V4-Pro: $1.74 / $3.48 per M input/output tokens

Reasoning_effort parameter lets you set thinking intensity (low/high/max) per call. "max" is recommended for agent tasks specifically.

The model will launch on Atlas Cloud. Developers can get API access.

Comments

onchainintel•1h ago

Using verified V4 pricing compared to Anthropic Claude:

vs Haiku 4.5: 3.3x cheaper input, 10x cheaper output vs Sonnet 4.6: 10x cheaper input, 30x cheaper output vs Opus 4.7: 17x cheaper input, 50x cheaper output

Mind-blowingly cheaper by comparison.

Do those that deserve the world, get the world?

AI-powered knowledge assistant for sexual and reproductive health

Quest Browser 146.0 adds experimental support for WebGPU in WebXR

The importance of stupidity in scientific research (2008) [pdf]

How Europe regulated itself into American vassalage

Iran War Has Drained U.S. Supplies of Critical, Costly Weapons

Dutch government secures deal with European cloud platform STACKIT

PuzzleScript

Contral AI

What Are Unix Domain Sockets?

Paint But…

GitGuardian analysis of the bitwarden/CLI compromise

Rendezvous and Docking: A User's Guide for Non Rocket Scientists

Microsoft offers buyouts for longtime employees

FujiNet Go 800 – Atari800 Emulator for Android

The Surveillance Accountability Act Full Text [pdf]

OpenAI deprecates all GPT nano fine tuning

Why Not Venus?

Running Bare-Metal Rust Alongside ESP-IDF on the ESP32-S3's Second Core

The Budgeting Mistake That Cost Uber Its Annual AI Spend in 4 Months

Tremendous Iranian Invasion: A Text Misadventure

Essential Voice by Nothing

Familiarity is the enemy: On why Enterprise systems have failed for 60 years

Intel Arc Pro B70 Review

ASML's latest chipmaking gear is too pricey, even for TSMC

Intel Arc Pro B70 benchmarks for LLMs and video generation

DeepSeek's Sequel Set to Extend China's Reach in Open-Source A.I

Ubuntu 26.04 LTS Released

AI Resume Reviewer

Show HN: GitRails-Let agents call only the GitHub endpoints and params you allow