frontpage.

Is anyone still using LLM APIs?

Open models like SmolLM3 (~3B) and Qwen2-1.5B are getting surprisingly capable - and they run fine on laptops or even phones. With Apple rolling out on-device LLMs in iOS 18, it feels like we’re entering a real local-first phase.

Small models already handle focused jobs: lightweight copilots, captioning, inspection.

And not just text - Gemma 2 2B Vision and Qwen2-VL can caption and reason about images locally.

Hardware’s there too: Apple’s M-series Neural Engine hits ~133 TOPS, and consumer GPUs chew through 4-8B models.

Tooling’s catching up fast: * Ollama for local runtimes (GGUF, simple CLI) * Cactus / RunLocal for mobile * ExecuTorch / LiteRT for on-device inference

Still some pain: iOS memory limits, packaging overhead, distillation quirks. Quantization helps, but 4-bit isn’t magic.

The upside’s clear: privacy by default, offline by design, zero latency, no token bills.

The cloud won’t die, but local compute finally feels fun again.

What’s keeping small models from going fully on-device?

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres