EdgeFoundry – Deploy and Monitor Local LLMs

https://github.com/TheDarkNight21/edge-foundry

2•allaffa•4mo ago

Comments

allaffa•4mo ago

Hey HN,

I’ve been working on EdgeFoundry, an open-source DevOps and observability toolkit that makes it easy to deploy, monitor, and manage local LLMs on your own machine or private server.

What it does EdgeFoundry helps you: • Run quantized LLMs locally (like TinyLlama or Phi-3) using LlamaCPP • Monitor telemetry such as latency, tokens per second, and memory usage • Use a simple CLI to deploy, start, stop, and view models • Store and visualize metrics in a local SQLite database and React dashboard • Keep everything offline-first and privacy-friendly

In short: Ollama runs your model — EdgeFoundry helps you deploy and observe it like a production system.

Key Features (MVP) • CLI: edgefoundry deploy/start/stop/status • Local agent (FastAPI + LlamaCPP) to run the model • Telemetry logging for latency, memory, and token throughput • Local dashboard (React) for visualizing metrics • SQLite backend for offline data storage • Support for TinyLlama and Phi-3 Mini out of the box

Why I built this While building local AI projects like offline RAG assistants, I realized there was no easy way to deploy and track local models with observability and lifecycle management like we have in the cloud. Developers want control, privacy, and insight — but tools like Ollama lack monitoring, telemetry, or multi-device orchestration.

EdgeFoundry fills that gap by offering the DevOps and observability layer for edge AI.

Who it’s for • Developers running quantized models locally • Teams building offline-first AI apps • Startups needing on-prem AI for compliance • Anyone who wants visibility into local LLM performance

Quick Start

# 1. Install pip install edgefoundry

# 2. Deploy a local model edgefoundry deploy --model tinyllama-1b-3bit.gguf

# 3. Start the agent edgefoundry start

# 4. Open the dashboard edgefoundry dashboard

You’ll see live metrics like latency, memory usage, and tokens per second for each inference.

Future Plans The next phase of EdgeFoundry is to enable mass deployment and testing of local AI models across devices. The goal is to make it possible for companies to: • Deploy local models at scale to phones, laptops, or IoT devices • Collect telemetry and performance data from real devices or simulations (for example, using Android Studio or local emulators) • Use this data to evaluate, tune, and monitor model performance before and after rollout

This would let teams building privacy-first or on-device AI systems manage fleets of local deployments with the same level of visibility and control they have in the cloud.

Feedback wanted This is an early MVP. I’d love feedback on: • What features you’d want for multi-device orchestration • Whether cloud sync or over-the-air updates would be useful • What matters most for large-scale local deployments on phones or computers

GitHub: https://github.com/TheDarkNight21/edge-foundry

If you try it, please share your experience or open an issue. I’m eager to hear from others building privacy-first AI tools or deploying LLMs locally.

Thanks for reading. I’ll be in the comments to answer questions and discuss next steps.

Show HN: Medinilla – an OCPP compliant .NET back end (partially done)

How Does AI Distribute the Pie? Large Language Models and the Ultimatum Game

Resistance Infrastructure

Fire-juggling unicyclist caught performing on crossing

Restoring a lost 1981 Unix roguelike (protoHack) and preserving Hack 1.0.3

GPS and Time Dilation – Special and General Relativity

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: I built a clawdbot that texts like your crush

Scientists reverse Alzheimer's in mice and restore memory (2025)

Compiling Prolog to Forth [pdf]

Show HN: Cymatica – an experimental, meditative audiovisual app

GitBlack: Tracing America's Foundation

Horizon-LM: A RAM-Centric Architecture for LLM Training

We just ordered shawarma and fries from Cursor [video]

Correctio

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

Free Trial: AI Interviewer

FDA intends to take action against non-FDA-approved GLP-1 drugs

Supernote e-ink devices for writing like paper

We are QA Engineers now

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

Show HN: Poddley.com – Follow people, not podcasts

Layoffs Surge 118% in January – The Highest Since 2009

Papyrus 114: Homer's Iliad

DicePit – Real-time multiplayer Knucklebones in the browser

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Show HN: AI Agent Tool That Keeps You in the Loop

Why Every R Package Wrapping External Tools Needs a Sitrep() Function