frontpage.

I’ve been building a small hardware box that runs local LLMs like Mistral, Qwen and Llama, and exposes an OpenAI compatible API on your local network. There’s no cloud, no login system and no telemetry. I built it because a lot of small firms want ChatGPTbstyle tools but can’t use cloud AI for privacy or compliance reasons, and most don’t want to deal with GPU servers, drivers, Docker or model configs. The aim is to make local AI feel as simple as plugging in a router.

Right now the box boots into a very simple web UI where you choose a model and start using it. The API follows the OpenAI format for chat completions and embeddings. It can run different models depending on the hardware you pick, either a Jetson Orin Nano or an x86 mini-PC with a GPU. It stores data locally, supports basic RAG indexing and only exposes itself on the LAN by default.

A few things still aren’t working. There’s no multi-user rate limiting yet. The RAG quality is basic and I’m still improving chunking and reranking. The Orin runs hot under heavy load, so thermal performance needs work. It’s also still a prototype rather than a finished consumer product.

On the technical side, it runs containerized model servers using Ollama and some custom runners. Models load through GGUF or TensorRT-LLM depending on the hardware. The API layer follows the OpenAI spec. The RAG pipeline uses local embeddings and a vector database. The software stack is a mix of TypeScript and Python.

I’m looking for feedback from anyone who has built or deployed local inference before. I’m trying to understand what thermal and power issues you’ve run into, whether a drop-in OpenAI compatible box is actually useful to small teams, what hardware setups I should consider, and any honest critiques of the idea.

Apakah Tokopedia Punya WhatsApp

Thunderbird Adds Native Microsoft Exchange Email Support

Berapakah Nomor Tokopedia

Show HN: We built an easy way to sync multiple Google Calendars

Unofficial, rules-compliant, browser based Arkham Horror: The Card Game

Berapakah Wa Tokopedia

ZK-KYC

Show HN: Excel Custom Functions in Zig

Unity and Epic Games Together Advance the Open, Interoperable Future

Art Institute of Chicago Guts Video Data Bank Staff, Sparking Outcry

AutoSubSync – Synchronize subtitles automatically or manually

I created an AI logistics marketplace in Manhattan

AI adoption needs light, not hope

Energy resilience key for Taiwan: former Intel CEO

Paola Nagni

Beyond the Vector API – A Quest for a Lower Level API [JVMLS]

New magnetic component discovered in Faraday effect after nearly two centuries

Open source: what do we think?

Canada announces massive jump in funding to European Space Agency

Learning to Boot from PXE

Rentay

Show HN: Godantic – JSON Schema and Validation for Go LLM Apps

Toon for Oracle: A Token-Efficient Data Format for LLMs

Gemini 3, Winners and Losers, Integration and the Enterprise

JSON to Toon

Show HN: Synch- an AI dating app with an emotionally intelligent coach

Quadratic Gravity

Airbags, and How Mercedes-Benz Hacked Your Hearing

Loose Wire on Carrier Dali Lead to Blackouts, Contact with Baltimore's Bridge

Crypto Mining ASIC Goes Deep Sub-Threshold on 3 Nm

Show HN: Small hardware box that runs local LLMs and exposes an OpenAI API