frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

LLMs are powerful, but enterprises are deterministic by nature

2•prateekdalal•27m ago•0 comments

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

44•UmYeahNo•1d ago•28 comments

Ask HN: Ideas for small ways to make the world a better place

12•jlmcgraw•13h ago•18 comments

Ask HN: Non AI-obsessed tech forums

23•nanocat•11h ago•18 comments

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

44•Invictus0•1d ago•11 comments

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

2•netfortius•8h ago•1 comments

Ask HN: Who wants to be hired? (February 2026)

139•whoishiring•4d ago•514 comments

AI Regex Scientist: A self-improving regex solver

6•PranoyP•15h ago•1 comments

Ask HN: Who is hiring? (February 2026)

312•whoishiring•4d ago•511 comments

Tell HN: Another round of Zendesk email spam

104•Philpax•2d ago•54 comments

Ask HN: Is Connecting via SSH Risky?

19•atrevbot•2d ago•37 comments

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

17•jchung•2d ago•12 comments

Ask HN: Why LLM providers sell access instead of consulting services?

4•pera•21h ago•13 comments

Ask HN: What is the most complicated Algorithm you came up with yourself?

3•meffmadd•23h ago•7 comments

Ask HN: How does ChatGPT decide which websites to recommend?

5•nworley•1d ago•11 comments

Ask HN: Any International Job Boards for International Workers?

2•15charslong•10h ago•2 comments

Ask HN: Is it just me or are most businesses insane?

7•justenough•1d ago•6 comments

Ask HN: Mem0 stores memories, but doesn't learn user patterns

9•fliellerjulian•2d ago•6 comments

Ask HN: Is there anyone here who still uses slide rules?

123•blenderob•3d ago•122 comments

Kernighan on Programming

170•chrisjj•4d ago•61 comments

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

2•guhsnamih•1d ago•4 comments

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

5•wewewedxfgdf•1d ago•2 comments

We built a serverless GPU inference platform with predictable latency

5•QubridAI•2d ago•1 comments

Ask HN: How Did You Validate?

4•haute_cuisine•1d ago•4 comments

Ask HN: Does a good "read it later" app exist?

8•buchanae•3d ago•18 comments

Ask HN: Have you been fired because of AI?

17•s-stude•4d ago•15 comments

Ask HN: Cheap laptop for Linux without GUI (for writing)

15•locusofself•3d ago•16 comments

Ask HN: Anyone have a "sovereign" solution for phone calls?

12•kldg•3d ago•1 comments

Test management tools for automation heavy teams

2•Divyakurian•1d ago•2 comments

Ask HN: OpenClaw users, what is your token spend?

14•8cvor6j844qw_d6•4d ago•6 comments
Open in hackernews

Ask HN: Are you running local LLMs? What are your key use cases?

16•briansun•6mo ago
2025 feels like a breakout year for local models. Open‑weight releases are getting genuinely useful: from Google’s Gemma to recent *gpt‑oss* drops, the gap with frontier commercial models keeps narrowing for many day‑to‑day tasks.

Yet outside of this community, local LLMs still don’t seem mainstream. My hunch: *great UX and durable apps are still thin on the ground.*

If you are using local models, I’d love to learn from your setup and workflows. Please be specific so others can calibrate:

Model(s) & size: exact name/version, and quantization (e.g., Q4_K_M).

Runtime/tooling: e.g., Ollama, LM studio, etc.

Hardware: CPU/GPU details (VRAM/RAM), OS. If laptop/edge/home servers, mention that.

Workflows where local wins: privacy/offline, data security, coding, huge amount extraction, RAG over your files, agents/tools, screen capture processing—what’s actually sticking for you?

Pain points: quality on complex reasoning, context management, tool reliability, long‑form coherence, energy/thermals, memory, Windows/Mac/Linux quirks.

Favorite app today: the one you actually open daily (and why).

Wishlist: the app you wish existed.

Gotchas/tips: config flags, quant choices, prompt patterns, or evaluation snippets that made a real difference.

If you’re not using local models yet, what’s the blocker—setup friction, quality, missing integrations, battery/thermals, or just “cloud is easier”? Links are welcome, but what helps most is concrete numbers and anecdotes from real use.

A simple reply template (optional):

``` Model(s): Runtime/tooling: Hardware: Use cases that stick: Pain points: Favorite app: Wishlist: ```

Also curious how people think about privacy and security in practice. Thanks!

Comments

incomingpain•6mo ago
Python coding is practically the only usecase for local for me.

Cloud llm are able to run 1 trillion parameters and have all of python knowledge in a transparent rag that's 100gbit or faster. Of course they'll be the bestest on the block.

But when the new GPT coding benchmarks only barely behind grok 4 or gpt5 with high reasoning.

>Model(s) & size: exact name/version, and quantization (e.g., Q4_K_M).

My most reliable setup is Devstral + openhands. unsloth Q6_K_XL, 85,000 context, flash attention, kcache and vcache quant at Q8.

Second most reliable. GPT-OSS-20B + opencode. Default MXFP4, I can only load up 31,000 context or it fails?(still plenty but hoping this bug gets fixed), you cant use flash attention or kv or v quantization or it becomes dumb as rocks. This harmony stuff is annoying.

Still preliminary, just got working today, but testing is really good. Qwen3-30b-a3b-thinking-2507 + roo code or qwencode, 80,000 context, unsloth q4_k_xl, flash attention, kcache and vcache quant at Q8.

>Runtime/tooling: e.g., Ollama, LM studio, etc.

LM studio. I need vulkan for my setup. rocm is just a pain in the ass. They need to support way more linux distros.

24gb vram.

briansun•6mo ago
Super useful config dump—thanks. Do you have wall‑clock numbers for prefill/gen tokens/sec and power draw on the 24GB card for those three setups? Also curious where quality starts to degrade vs. context length in your tests.
doppelgunner•6mo ago
Tried running a local LLM and it felt like adopting a pet dragon. Fun at first, but then it keeps eating all my GPU and still refuses to clean up its own context window.
briansun•6mo ago
Haha, a cute pet dragon. Two knobs that helped me tame VRAM: KV‑cache quant/eviction and sliding‑window attention (if your runtime supports them). What model/runtime and context are you running when it tips over? Are you using Ollama?
dabockster•6mo ago
LLMs in general are going to make C/C++ development viable again for the masses. A big reason everything went to web frameworks was memory safety - even big frameworks like Electron and React Native are memory safe overall. Writing low level code by hand was a migraine experience, even for experienced devs. AI changes this relationship entirely, even if you just use it as a pair programmer. I've had even small 8b models correctly call out memory unsafe code and suggest fixes. Larger models are even better.

Local LLMs? A new renaissance. All that power without having to pinky swear with a cloud provider that they won't just take your generated code and use it for themselves.

Expect to see some awesome Windows and Mac apps being developed in the coming months and years. 100% on device, memory safe, and with a thin resource footprint. The 1990s/2000s are coming back.

briansun•6mo ago
Thanks-this is genuinely encouraging; I'd assumed AI help was strongest on front-end work(web apps/SwiftUI), so this is my first concrete example of an LLM catching memory‑unsafe C/C++-could you share your toolchain (CLI/IDE integration) and model details (name/quant/runtime), and what "awesome" 100% on‑device Windows/Mac apps you most want to see?
dabockster•5mo ago
I've been using a lot of the Chinese open source models like R1 and the Qwen coder series. I've also been trying some community mods of them from HuggingFace.

I was using a combo of Ollama + Roo Code for the front/back end but Ollama is kind of dumb when it comes to memory overload protections. I've since switched to LM Studio and it has a very annoying hard timeout of 2-3 minutes on its API server. Local isn't perfect right now, but when it does run you can see the potential as clear as day.

cdaringe•6mo ago
I hope so too. I’ve been using claude in some medium sized c projects and tbh its still not excellent. YMMV. Bolting it onto existing projects has been difficult. It doesn’t do a great job of understanding the build graph, nor the dependency relationships between the host system’s libraries (even with coaching, particularly in mac/arm space). It does pretty good at crushing impl code, but kind of the infrastructure and setup pieces are still lagging . I have a counter example on a smaller project, for which I was programming an E ink display, and it was able to understand all of the dependencies quite well, but that’s mainly cause they were localized and not heavily reliant on any sort of system libraries, and the build system was much simpler
dabockster•5mo ago
The Chinese models like Deepseek R1 and the Qwen Coder series have been way better at coding than any of the western models. And they're open source, too.

32 billion sized R1 and Qwen models blow GPT, Gemini, and Claude out of the water for coding tasks usually. And they can be made to run on a standard gaming PC off the shelf from somewhere like Best Buy or Costco. They will run really really slowly, though, but they will run.

lbhdc•6mo ago
I almost exclusively self host the models I use.

Currently I am using llama.cpp for an interactive repl chat. I was previously using Alpaca (a GTK GUI), but was annoyed with how slow it was and some random crashes. I am transitioning some of this to self hosted in the cloud for things that can't run on my laptop.

I am looking to get away from my current interface, and write my own. Mostly for experience of deeply integrating agents into a program. If anyone knows a good library for interacting with a local model that doesn't involve standing up a webserver I am interested :)

My daily driver is gemma3n. Its been a nice balance between speed and performance without spinning up my laptop fans.

I am super interested in local models, partially because there is no friction from managed services, but also because I think as small models become more viable we will see an explosion of apps incorporating them.

briansun•6mo ago
Gemma3n as a daily driver sounds nice—4b or 8b? and rough tokens/sec on your laptop? And have you A/B‑tested code generation quality across local models (e.g., Gemma3n vs others)?
lbhdc•6mo ago
I am using the smaller one, specifically the e2b-it flavor.

I get ~20-30 tok/sec. It's fast enough that its not frustrating, but if it were faster you could more easily skim as it generates.

I haven't done any serious testing. My process is typically learning about new models on HN or elsewhere, and trying to give them a real shake. I have some goto code generation prompts that I try on all of them. None succeed but they are getting close. I also do a lot of just feeling it out. The more I can use solutions unedited the better it feels.

muzani•6mo ago
Restrictive privacy laws.

Also log into your Claude/OpenAI dashboard and read the logs. Now they log every damn thing that goes through the API and keep it there for a minimum of 30 days without any option to delete (unless you're enterprise). No anonymization or anything. Just raw audit logs.

briansun•6mo ago
Thanks for raising the privacy angle. Do you have a source or plan details for the 30‑day retention and the lack of deletion options (non‑enterprise)? It would help to know account tier and where that policy is documented.

Beyond policy, how are you actually using local LLMs—what tasks do you run locally vs. in the cloud—and which scenarios feel most privacy‑sensitive to you (e.g., proprietary code, contracts, health notes)?

roscas•6mo ago
qwen3-coder for programming help as I'm not very good at css nor flask. sometimes I test other models just for fun, but only for coding or sysadmin.