frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Medinilla – an OCPP compliant .NET back end (partially done)

https://github.com/eliodecolli/Medinilla
2•rhcm•2m ago•0 comments

How Does AI Distribute the Pie? Large Language Models and the Ultimatum Game

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6157066
1•dkga•3m ago•1 comments

Resistance Infrastructure

https://www.profgalloway.com/resistance-infrastructure/
2•samizdis•7m ago•0 comments

Fire-juggling unicyclist caught performing on crossing

https://news.sky.com/story/fire-juggling-unicyclist-caught-performing-on-crossing-13504459
1•austinallegro•7m ago•0 comments

Restoring a lost 1981 Unix roguelike (protoHack) and preserving Hack 1.0.3

https://github.com/Critlist/protoHack
2•Critlist•9m ago•0 comments

GPS and Time Dilation – Special and General Relativity

https://philosophersview.com/gps-and-time-dilation/
1•mistyvales•12m ago•0 comments

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

https://github.com/writerslogic/witnessd
1•davidcondrey•13m ago•1 comments

Show HN: I built a clawdbot that texts like your crush

https://14.israelfirew.co
2•IsruAlpha•15m ago•1 comments

Scientists reverse Alzheimer's in mice and restore memory (2025)

https://www.sciencedaily.com/releases/2025/12/251224032354.htm
1•walterbell•18m ago•0 comments

Compiling Prolog to Forth [pdf]

https://vfxforth.com/flag/jfar/vol4/no4/article4.pdf
1•todsacerdoti•19m ago•0 comments

Show HN: Cymatica – an experimental, meditative audiovisual app

https://apps.apple.com/us/app/cymatica-sounds-visualizer/id6748863721
1•_august•20m ago•0 comments

GitBlack: Tracing America's Foundation

https://gitblack.vercel.app/
2•martialg•20m ago•0 comments

Horizon-LM: A RAM-Centric Architecture for LLM Training

https://arxiv.org/abs/2602.04816
1•chrsw•21m ago•0 comments

We just ordered shawarma and fries from Cursor [video]

https://www.youtube.com/shorts/WALQOiugbWc
1•jeffreyjin•22m ago•1 comments

Correctio

https://rhetoric.byu.edu/Figures/C/correctio.htm
1•grantpitt•22m ago•0 comments

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

https://chillphysicsenjoyer.substack.com/p/trying-to-make-an-automated-ecologist
1•crescit_eundo•26m ago•0 comments

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

https://www.twz.com/air/watch-ukraines-minigun-firing-drone-hunting-turboprop-in-action
1•breve•27m ago•0 comments

Free Trial: AI Interviewer

https://ai-interviewer.nuvoice.ai/
1•sijain2•27m ago•0 comments

FDA intends to take action against non-FDA-approved GLP-1 drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
21•randycupertino•28m ago•11 comments

Supernote e-ink devices for writing like paper

https://supernote.eu/choose-your-product/
3•janandonly•30m ago•0 comments

We are QA Engineers now

https://serce.me/posts/2026-02-05-we-are-qa-engineers-now
1•SerCe•31m ago•0 comments

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

https://arxiv.org/abs/2602.01465
2•NBenkovich•31m ago•0 comments

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

https://www.latent.space/p/adversarial-reasoning
1•swyx•31m ago•0 comments

Show HN: Poddley.com – Follow people, not podcasts

https://poddley.com/guests/ana-kasparian/episodes
1•onesandofgrain•39m ago•0 comments

Layoffs Surge 118% in January – The Highest Since 2009

https://www.cnbc.com/2026/02/05/layoff-and-hiring-announcements-hit-their-worst-january-levels-si...
13•karakoram•40m ago•0 comments

Papyrus 114: Homer's Iliad

https://p114.homemade.systems/
1•mwenge•40m ago•1 comments

DicePit – Real-time multiplayer Knucklebones in the browser

https://dicepit.pages.dev/
1•r1z4•40m ago•1 comments

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

https://arxiv.org/abs/2601.14340
2•PaulHoule•41m ago•0 comments

Show HN: AI Agent Tool That Keeps You in the Loop

https://github.com/dshearer/misatay
2•dshearer•43m ago•0 comments

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

https://drmowinckels.io/blog/2026/sitrep-functions/
1•todsacerdoti•43m ago•0 comments
Open in hackernews

Shimmy v1.7.0: Running 42B Moe Models on Consumer GPUs with 99.9% VRAM Reduction

https://github.com/Michael-A-Kuykendall/shimmy/releases/tag/v1.7.0
3•MKuykendall•4mo ago

Comments

MKuykendall•4mo ago
I just released Shimmy v1.7.0 with MoE (Mixture of Experts) CPU offloading support, and the results are pretty exciting for anyone who's hit GPU memory walls. What this solves If you've tried running large language models locally, you know the pain: a 42B parameter model typically needs 80GB+ of VRAM, putting it out of reach for most developers. Even "smaller" 20B models often require 40GB+. The breakthrough MoE CPU offloading intelligently moves expert layers to CPU while keeping active computation on GPU. In practice: Phi-3.5-MoE 42B: Runs on 8GB consumer GPUs (was impossible before) GPT-OSS 20B: 71.5% VRAM reduction (15GB → 4.3GB, measured) DeepSeek-MoE 16B: Down to 800MB VRAM with Q2 quantization The tradeoff is 2-7x slower inference, but you can actually run these models instead of not running them at all. Technical implementation Built on enhanced llama.cpp bindings with new with_cpu_moe() and with_n_cpu_moe(n) methods Two CLI flags: --cpu-moe (automatic) and --n-cpu-moe N (manual control) Cross-platform: Windows MSVC CUDA, macOS Metal, Linux x86_64/ARM64 Still sub-5MB binary with zero Python dependencies Ready-to-use models I've uploaded 9 quantized models to HuggingFace specifically optimized for this: Phi-3.5-MoE variants (Q8.0, Q4 K-M, Q2 K) DeepSeek-MoE variants GPT-OSS 20B baseline Getting started # Install cargo install shimmy

# Download a model huggingface-cli download MikeKuykendall/phi-3.5-moe-q4-k-m-cpu-offload-gguf

# Run with MoE offloading ./shimmy serve --cpu-moe --model-path phi-3.5-moe-q4-k-m.gguf Standard OpenAI-compatible API, so existing code works unchanged. Why this matters This democratizes access to state-of-the-art models. Instead of needing a $10,000 GPU or cloud spending, you can run expert models on gaming laptops or modest server hardware. It's not just about making models "work" - it's about sustainable AI deployment where organizations can experiment with cutting-edge architectures without massive infrastructure investments. The technique itself isn't novel (llama.cpp had MoE support), but the Rust bindings, production packaging, and curated model collection make it accessible to developers who just want to run large models locally. Release: https://github.com/Michael-A-Kuykendall/shimmy/releases/tag/... Models: https://huggingface.co/MikeKuykendall Happy to answer questions about the implementation or performance characteristics.