frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

How to Setup a Local Coding Agent on macOS

https://ikyle.me/blog/2026/how-to-setup-a-local-coding-agent-on-macos
71•kkm•1h ago

Comments

cdolan•1h ago
Is there a link to the video? It did not render when I went to the page. Curious about the real-time feel of this
dewey•46m ago
That's the direct link: https://ikyle.me/blog/2026/how-to-setup-a-local-coding-agent...
c-hendricks•13m ago
Note this is cut to just before the model responds, so not a great way for people to judge the real-time feel of this.
c-hendricks•59m ago
Not sure you really need huggingface-cli to download anything if you're just using llama.cpp. You can pass `-hf ...` and it will download the models for you. Set `LLAMA_CACHE` to change where the downloads go:

  LLAMA_CACHE="models" ./llama-server \
    -hf unsloth/gemma-4-31B-it-GGUF:UD-Q4_K_XL \
    ...
dofm•55m ago
Yes.

-hfd for the draft model.

c-hendricks•40m ago
Nice, was wondering if there was a flag for the draft as well.

Not knocking huggingface-cli, just find it's much easier for people to try out this stuff when they can just

  mise use --global github:ggml-org/llama.cpp
  LLAMA_CACHE="models" llama-server \
    -hf unsloth/gemma-4-26B-A4B-it-qat-GGUF:UD-Q4_K_XL \
    --host 0.0.0.0 \
    --port 11434 \
    ...
ig0r0•52m ago
I wrote a similar post some time ago just used ollama and opencode https://blog.kulman.sk/running-local-llm-coding-server/
dofm•51m ago
Useful stuff in here that I wish I'd seen a few days ago :-)

I am not convinced that the MTP setup for the QAT model adds very much in terms of speed on my M1 Max, but it is definitely worth experimenting with.

Fiddling about with local models has done so much for my conceptual understanding of what is going on.

FWIW and YMMV but I also found the Gemma 4 MTP head was occasionally breaking markup in Opencode, causing the thinking to display untidily and ultimately in some cases missing the stop token. So I've stopped using MTP there for now.

Recent Qwen 3.6 models have developer role support so it will occasionally surprise you with a structured multiple choice questionnaire.

mft_•33m ago
I found a marginal downside to Qwen3.6-35B-A3B-MTP vs. the non-MTP equivalent on an M1 Max. I’ll maybe experiment with settings further though.
dofm•17m ago
Yeah. I think it might speed up time to first token but I am not sure how much that matters.

I do enjoy their different personalities when they are tackling "explain this" type puzzles, though.

Gemma writes so well — like a concise code blogger. It makes you understand that the thing we hate about AI slop writing is specifically the cheesy, marketingese sycophantic ChatGPT tone. It's a choice to sound that way.

Qwen writes more tersely by default, like much english language documentation in Chinese open source projects. A couple of lines, code example, fact, code example, line of blurb.

I use this prompt every now and then with a new model. It's obviously a classic SQL puzzle but I've asked new web developers this in the past (prompted by discovering that a client's subcontractor didn't understand it and was therefore unable to migrate some code from relying on dodgy pre-MySQL 5.x behaviours)

—

  I have a MySQL 5 table like this: [id, label, category, score].   It contains a list of items in different categories (text names like cat1, cat2, cat3) with a numerical score. Is there a way I can write a SQL query to find the item in each category that has the highest score, without using a subquery? No two entries in any category share a score.
—

I enjoy seeing what it deduces from the subtext.

Without "thinking" mode on, they always initially fail and you need to prompt them to find the answer. With thinking mode, they both produce really nice explanations.

For me, as an old freelancer who is pretty cynical about vibe coding or "agentic engineering", what I really want is an AI tool that can help me start to solve problems and help me find the right terminology or generate some boilerplate I can tinker with. Both of these models do fine at the kind of "starter" writing that I want when I am trying to untangle an idea.

namnnumbr•51m ago
oMLX (https://github.com/jundot/omlx) makes running the mlx inference server quite easy for those interested in UI-based hosting. oMLX also supports mtp or dflash drafting.
w10-1•20m ago
Agreed (not sure what you mean by UI-based hosting).

oMLX does the caching I need to fit models that are near gross memory, and it handles most of the work in finding usable models. After cobbling together various solutions over months, I now just use oMLX, often from Xcode. I can tell the difference between Gemma-4 (local/free) and Claude (paid) only on the largest tasks.

reddit_clone•28m ago
>64 GB

Thats the rub. I have an M4 with 48G. I wonder if it is worth testing this out.

My past attempts (with Ollama and various LLMs) were too slow to use.

hkchad•18m ago
I have a M5 MAX with 128, local models are toys compared to hosted ones. I've spent a lot of time and money trying to make it work even 1/2 as well.
attogram•25m ago
8b max on a std 16gb macbook. Anything more and your mac is toast
Aurornis•18m ago
> The benchmark prompt was:

> Write a compact Python function that parses a unified diff and returns the changed file paths. Then explain two edge cases.

> Each benchmark generated about 128 tokens.

Generating 128 tokens is probably not enough for good benchmark results. MTP speedup depends on how often the predicted tokens are accepted. In my experience, the very early output has a higher acceptance rate, so short testing can give false positive speedups.

llama.cpp includes a tool specifically for benchmarking that will sweep the arguments for you so you don't have to restart the server and send it prompts:

https://github.com/ggml-org/llama.cpp/blob/master/tools/llam...

EDIT: Also the section about downloading the models should have mentioned that llama.cpp has a "-hf" argument that will download the models for you. I appreciate the author for sharing their experience, but for beginners this might not be the best guide to use.

vladgur•16m ago
I have used omlx.ai with great success to both download multiple mlx models (including gemma and qwen) suited for my hardware AND to be able to automagically launch both open-source and close-source (claude code, codex) harnesses using these models. All from a web or desktop UI

You would not need to follow a blog post with omlx IMHO

fridder•13m ago
It truly is the SOTA for local inference on mac. Even when there are regressions the dev(s) are insanely responsive. It is the most impressive opensource project I've seen in a awhile
jmkni•10m ago
FYI you can open Claude code in the terminal, point it at this article and just tell it to "do it", if you're feeling extra lazy
hanifbbz•7m ago
Here's a visual post for using LM Studio and VS Code (and Pi): https://blog.alexewerlof.com/p/local-llms-for-agentic-coding

One way or another local AI is the future. I actually find weaker models more interesting because it keeps me sharp (at the cost of velocity of course).

metadaemon•5m ago
Has anyone compared a setup like this to just using LM Studio?

Two Years of OCaml

https://borretti.me/article/two-years-ocaml
1•tosh•1m ago•0 comments

We Can Live with AI, but Not Like This

https://atroxclarus499795.substack.com/p/we-can-live-with-ai-but-not-like
1•Poppytam•2m ago•0 comments

Setting to disconnect your off-Meta activity is going away

https://help.instagram.com/1690298141991062
3•saharshpruthi•3m ago•1 comments

Show HN: Flag Study – learn the flags behind the World Cup

https://www.flagstudy.com/
1•jsheffers•5m ago•0 comments

When The C/C++ Users Journal Disappeared

https://freshsources.com/blog/files/cpp-source.html
2•chuckallison•7m ago•0 comments

Service Bindings for Postgres: Per-App Roles and Grants

https://openrun.dev/blog/service-binding/
1•ajayvk•10m ago•0 comments

Apple Has Officially Stopped Caring About Purveying Accurate Information

https://buttondown.com/theinternet/archive/apple-has-officially-stopped-caring-about-accuracy/
3•speckx•15m ago•1 comments

Large etchings of numbers signaling opposition to Trump appear on National Mall

https://www.washingtonpost.com/style/2026/06/11/apparent-etchings-86-47-seen-trump-threat-spotted...
1•evan_•15m ago•1 comments

Keynesian Beauty Contest

https://en.wikipedia.org/wiki/Keynesian_beauty_contest
2•throw0101c•15m ago•1 comments

How pierre diffs codeviewer component works

https://twitter.com/backnotprop/status/2065479594023829619
1•ramoz•16m ago•2 comments

You cannot control the mind, and that is not the problem

https://pilgrima.ge/p/the-way-back-home-is-a-circle
3•momentmaker•17m ago•1 comments

The C-64 Scene Database

https://csdb.dk/
3•jruohonen•18m ago•0 comments

New technology developed could prevent 70M tonnes of milk waste each year

https://www.abc.net.au/news/2026-06-07/act-technology-prevent-70-million-tonnes-of-milk-waste-ann...
1•speckx•18m ago•0 comments

Determinate Secure Packages 26.05

https://determinate.systems/blog/determinate-secure-packages-26-05/
1•biggestlou•20m ago•0 comments

Ask HN: How to deal with agents constantly messing up padding/alignment in UIs?

1•ex-aws-dude•20m ago•0 comments

Ghostty-Blackhole

https://github.com/s0xDk/ghostty-blackhole
2•pama•22m ago•0 comments

The end of progress against extreme poverty?

https://ourworldindata.org/end-progress-extreme-poverty
6•Luc•22m ago•1 comments

Show HN: DR Lens – Ahrefs Domain Rating in Chrome Toolbar

https://chromewebstore.google.com/detail/dr-lens-—-domain-rating-i/babgbadloikchhadbpmcmhokbbco...
1•kka•24m ago•0 comments

I counted every "incredible" in 11 years of Apple WWDC keynotes

https://buzznote.io/apple-wwdc/
2•radekmika•24m ago•0 comments

Cortex – Agent-Native Knowledge OS on Markdown (Karpathy's LLM Wiki, via MCP)

https://github.com/synpulse8-opensource/pulse8-ai-cortex-knowledge-vault
1•jiekepan•25m ago•0 comments

Making FlashAttention-4 faster for inference

https://modal.com/blog/flash-attention-4-faster
2•birdculture•26m ago•0 comments

Honeypot Design

https://bruceediger.com/posts/honeypot-design/
2•NaOH•31m ago•1 comments

Reclaiming Digital Sovereignty

https://www.ucl.ac.uk/bartlett/publications/2024/dec/reclaiming-digital-sovereignty
1•carschno•32m ago•0 comments

Knowledge Collapse

https://www.bostonreview.net/articles/knowledge-collapse/
3•pseudolus•32m ago•0 comments

Show HN: Geiger – A blast radius triage tool for any credential

https://github.com/puck-security/geiger
2•thesubtlety•33m ago•0 comments

Finding high-severity security issues with publicly available models

https://twitter.com/RampLabs/status/2059678575939273091
1•gmays•34m ago•0 comments

Being an old school web-based sports sim dev in the era of vibe coded games

https://zengm.com/blog/2026/06/vibecoded-games/
1•YesBox•35m ago•0 comments

Measuring LLMs' impact on N-day exploits

https://red.anthropic.com/2026/n-days/
3•hackerBanana•35m ago•0 comments

How ClickHouse Became Fast at Joins

https://clickhouse.com/blog/clickhouse-fast-joins
3•eatonphil•35m ago•0 comments

A Fake Bug Report Hijacks Your AI Coding Agent – and Nothing Catches It

https://tenetsecurity.ai/blog/agentjacking-coding-agents-with-fake-sentry-errors/
2•patrickdavey•37m ago•0 comments