frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
1•surprisetalk•1m ago•0 comments

MS-DOS game copy protection and cracks

https://www.dosdays.co.uk/topics/game_cracks.php
2•TheCraiggers•2m ago•0 comments

Updates on GNU/Hurd progress [video]

https://fosdem.org/2026/schedule/event/7FZXHF-updates_on_gnuhurd_progress_rump_drivers_64bit_smp_...
1•birdculture•3m ago•0 comments

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

https://xcancel.com/search?f=tweets&q=davenewworld_2%2Fstatus%2F2020128223850316274
3•doener•3m ago•1 comments

MyFlames: Visualize MySQL query execution plans as interactive FlameGraphs

https://github.com/vgrippa/myflames
1•tanelpoder•4m ago•0 comments

Show HN: LLM of Babel

https://clairefro.github.io/llm-of-babel/
1•marjipan200•4m ago•0 comments

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

https://github.com/lance0/xfr
1•tanelpoder•6m ago•0 comments

Famfamfam Silk icons – also with CSS spritesheet

https://github.com/legacy-icons/famfamfam-silk
1•thunderbong•6m ago•0 comments

Apple is the only Big Tech company whose capex declined last quarter

https://sherwood.news/tech/apple-is-the-only-big-tech-company-whose-capex-declined-last-quarter/
1•elsewhen•9m ago•0 comments

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
2•todsacerdoti•11m ago•0 comments

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

https://github.com/yupme-bot/kernel-ndjson-proofs
1•Slaine•14m ago•0 comments

The Greater Copenhagen Region could be your friend's next career move

https://www.greatercphregion.com/friend-recruiter-program
1•mooreds•15m ago•0 comments

Do Not Confirm – Fiction by OpenClaw

https://thedailymolt.substack.com/p/do-not-confirm
1•jamesjyu•15m ago•0 comments

The Analytical Profile of Peas

https://www.fossanalytics.com/en/news-articles/more-industries/the-analytical-profile-of-peas
1•mooreds•15m ago•0 comments

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

https://jobswithgpt.com/blog/llm-eval-hallucinations-t20-cricket/
1•sp1982•15m ago•0 comments

What AI is good for, according to developers

https://github.blog/ai-and-ml/generative-ai/what-ai-is-actually-good-for-according-to-developers/
1•mooreds•15m ago•0 comments

OpenAI might pivot to the "most addictive digital friend" or face extinction

https://twitter.com/lebed2045/status/2020184853271167186
1•lebed2045•17m ago•2 comments

Show HN: Know how your SaaS is doing in 30 seconds

https://anypanel.io
1•dasfelix•17m ago•0 comments

ClawdBot Ordered Me Lunch

https://nickalexander.org/drafts/auto-sandwich.html
3•nick007•18m ago•0 comments

What the News media thinks about your Indian stock investments

https://stocktrends.numerical.works/
1•mindaslab•19m ago•0 comments

Running Lua on a tiny console from 2001

https://ivie.codes/page/pokemon-mini-lua
1•Charmunk•20m ago•0 comments

Google and Microsoft Paying Creators $500K+ to Promote AI Tools

https://www.cnbc.com/2026/02/06/google-microsoft-pay-creators-500000-and-more-to-promote-ai.html
2•belter•22m ago•0 comments

New filtration technology could be game-changer in removal of PFAS

https://www.theguardian.com/environment/2026/jan/23/pfas-forever-chemicals-filtration
1•PaulHoule•23m ago•0 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
2•momciloo•24m ago•0 comments

Kinda Surprised by Seadance2's Moderation

https://seedanceai.me/
1•ri-vai•24m ago•2 comments

I Write Games in C (yes, C)

https://jonathanwhiting.com/writing/blog/games_in_c/
2•valyala•24m ago•1 comments

Django scales. Stop blaming the framework (part 1 of 3)

https://medium.com/@tk512/django-scales-stop-blaming-the-framework-part-1-of-3-a2b5b0ff811f
2•sgt•24m ago•0 comments

Malwarebytes Is Now in ChatGPT

https://www.malwarebytes.com/blog/product/2026/02/scam-checking-just-got-easier-malwarebytes-is-n...
1•m-hodges•24m ago•0 comments

Thoughts on the job market in the age of LLMs

https://www.interconnects.ai/p/thoughts-on-the-hiring-market-in
1•gmays•25m ago•0 comments

Show HN: Stacky – certain block game clone

https://www.susmel.com/stacky/
3•Keyframe•28m ago•0 comments
Open in hackernews

MCP in LM Studio

https://lmstudio.ai/blog/lmstudio-v0.3.17
240•yags•7mo ago

Comments

chisleu•7mo ago
Just ordered a $12k mac studio w/ 512GB of integrated RAM.

Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

LM Studio is newish, and it's not a perfect interface yet, but it's fantastic at what it does which is bring local LLMs to the masses w/o them having to know much.

There is another project that people should be aware of: https://github.com/exo-explore/exo

Exo is this radically cool tool that automatically clusters all hosts on your network running Exo and uses their combined GPUs for increased throughput.

Like HPC environments, you are going to need ultra fast interconnects, but it's just IP based.

dchest•7mo ago
I'm using it on MacBook Air M1 / 8 GB RAM with Qwen3-4B to generate summaries and tags for my vibe-coded Bloomberg Terminal-style RSS reader :-) It works fine (the laptop gets hot and slow, but fine).

Probably should just use llama.cpp server/ollama and not waste a gig of memory on Electron, but I like GUIs.

minimaxir•7mo ago
8 GB of RAM with local LLMs in general is iffy: a 8-bit quantized Qwen3-4B is 4.2GB on disk and likely more in memory. 16 GB is usually the minimum to be able to run decent models without compromising on heavy quantization.
hnuser123456•7mo ago
But 8GB of Apple RAM is 16GB of normal RAM.

https://www.pcgamer.com/apple-vp-says-8gb-ram-on-a-macbook-p...

arrty88•7mo ago
I concur. I just upgraded from m1 air with 8gb to m4 with 24gb. Excited to run bigger models.
diggan•7mo ago
> m4 with 24gb

Wow, that is probably analogous to 48GB on other systems then, if we were to ask an Apple VP?

vntok•7mo ago
Not sure what Apple VPs have to do with the tech but yeah, pretty much any core engineer you ask at Apple will tell you this.

Here is a nice article with some info about what memory compression is and how it works: https://arstechnica.com/gadgets/2013/10/os-x-10-9/#page-17

It's been a hard technical problem but is pretty much solved by now since its first debut in 2012-2013.

pxc•7mo ago
I've heard good things about how macOS handles memory relative to other operating systems. But Linux and Windows both have memory compression nowadays. So the claim is then not that memory compression makes your RAM twice as effective, but that macOS' memory compression is twice as good as the real and existing memory compression available on other operating systems.

Doesn't such a claim... need stronger evidence?

minimaxir•7mo ago
Interestingly it was AI (Apple Intelligence) that was the primary reason Apple abandoned that hedge.
dchest•7mo ago
It's 4-bit quantized (Q4_K_M, 2.5 GB) and still works well for this task. It's amazing. I've been running various small models on this 8 GB Air since the first Llama and GPT-J, and they improved so much!

macOS virtual memory works well on swapping in and out stuff to SSD.

karmakaze•7mo ago
Nice. Ironically well suited for non-Apple Intelligence.
incognito124•7mo ago
> I'm going to download it with Safari

Oof you were NOT joking

noman-land•7mo ago
Safari to download LM Studio. LM Studio to download models. Models to download Firefox.
teaearlgraycold•7mo ago
The modern ninite
sneak•7mo ago
I already got one of these. I’m spoiled by Claude 4 Opus; local LLMs are slower and lower quality.

I haven’t been using it much. All it has on it is LM Studio, Ollama, and Stats.app.

> Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

lol, yup. same.

chisleu•7mo ago
Yup, I'm spoiled by Claude 3.7 Sonnet right now. I had to stop using opus for plan mode in my Agent because it is just so expensive. I'm using Gemini 2.5 pro for that now.

I'm considering ordering one of these today: https://www.newegg.com/p/N82E16816139451?Item=N82E1681613945...

It looks like it will hold 5 GPUs with a single slot open for infiniband

Then local models might be lower quality, but it won't be slow! :)

kristopolous•7mo ago
The GPUs are the hard things to find unless you want to pay like 50% markup
sneak•7mo ago
That’s just what they cost; MSRP is irrelevant. They’re not hard to find, they’re just expensive.
evo_9•7mo ago
I was using Claude 3.7 exclusively for coding, but it sure seems like it got worse suddenly about 2–3 weeks back. It went from writing pretty solid code I had to make only minor changes to, to being completely off its rails, altering files unrelated to my prompt, undoing fixes from the same conversation, reinventing db access and ignoring existing coding 'standards' established in the existing codebase. Became so untrustworthy I finally gave OpenAi O3 a try and honestly, I was pretty surprised how solid it has been. I've been using o3 since, and I find it generally does exactly what I ask, esp if you have a well established project with plenty of code for it to reference.

Just wondering if Claude 3.7 has seemed differently lately for anyone else? Was my go to for several months, and I'm no fan of OpenAI, but o3 has been rock solid.

jessmartin•7mo ago
Could be the prompt and/or tool descriptions in whatever tool you are using Claude in that degraded. Have definitely noticed variance across Cursor, Claude Code, etc even with the exact same models.

Prompts + tools matter.

esskay•7mo ago
Cursor became awful over the last few weeks so it's likely them, no idea what they did to their prompt but its just been incredibly poor at most tasks regardless of which model you pick.
sneak•7mo ago
Me too. (re: Claude; I haven’t switched models.) It sucks because I was happily paying >$1k/mo in usage charges and then it all went south.
sneak•7mo ago
I’m firehosing about $1k/mo at Cursor on pay-as-you-go and am happy to do it (it’s delivering 2-10k of value each month).

What cards are you gonna put in that chassis?

teaearlgraycold•7mo ago
What are you going to do with the LLMs you run?
chisleu•7mo ago
Currently I'm using gemini 2.5 and claude 3.7 sonnet for coding tasks.

I'm interested in using models for code generation, but I'm not expecting much in that regard.

I'm planning to attempt fine tuning open source models on certain tool sets, especially MCP tools.

prettyblocks•7mo ago
I've been using openwebui and am pretty happy with it. Why do you like lm studio more?
truemotive•7mo ago
Open WebUI can leverage the built in web server in LM Studio, just FYI in case you thought it was primarily a chat interface.
prophesi•7mo ago
Not OP, but with LM Studio I get a chat interface out-of-the-box for local models, while with openwebui I'd need to configure it to point to an OpenAI API-compatible server (like LM Studio). It can also help determine which models will work well with your hardware.

LM Studio isn't FOSS though.

I did enjoy hooking up OpenWebUI to Firefox's experimental AI Chatbot. (browser.ml.chat.hideLocalhost to false, browser.ml.chat.provider to localhost:${openwebui-port})

s1mplicissimus•7mo ago
i recently tried openwebui but it was so painful to get it to run with local model. that "first run experience" of lm studio is pretty fire in comparison. can't really talk about actually working with it though, still waiting for the 8GB download
prettyblocks•7mo ago
Interesting. I run my local llms through ollama and it's zero trouble to get that working in openwebui as long as the ollama server is running.
diggan•7mo ago
I think that's the thing. Compared to LM Studio, just running Ollama (fiddling around with terminals) is more complicated than the full E2E of chatting with LM Studio.

Of course, for folks used to terminals, daemons and so on it makes sense from the get go, but for others it seemingly doesn't, and it doesn't help that Ollama refuses to communicate what people should understand before trying to use it.

noman-land•7mo ago
I love LM Studio. It's a great tool. I'm waiting for another generation of Macbook Pros to do as you did :).
imranq•7mo ago
I'd love to host my own LLMs but I keep getting held back from the quality and affordability of Cloud LLMs. Why go local unless there's private data involved?
mycall•7mo ago
Offline is another use case.
seanmcdirmid•7mo ago
Nothing like playing around with LLMs on an airplane without an internet connection.
asteroidburger•7mo ago
If I can afford a seat above economy with room to actually, comfortably work on a laptop, I can afford the couple bucks for wifi for the flight.
seanmcdirmid•7mo ago
If you are assuming that your Hainan airlines flight has wifi that isn't behind the GFW, even outside of cattle class, I have some news for you...
sach1•7mo ago
Getting around the GFW is trivially easy.
seanmcdirmid•7mo ago
ya ya, just buy a VPN, pay the yearly subscription, and then have them disappear the week after you paid. Super trivially frustrating.
vntok•7mo ago
VPN providers are first and foremost trust businesses. Why would you choose and pay one that is not well established and trusted? Mine have been there for more than a decade by now.

Alternatively, you could just set up your own (cheaper?) VPN relay on the tiniest VPS you can rent on AWS or IBM Cloud, right?

seanmcdirmid•7mo ago
The VPN providers that get you to jump the cloud in China are Chinese, and China is not yet a high trust society, just like how they’ll take your payment for one year of gym fees and then disappear the next week (sigh). If AWS or IBM cloud find out you are using them as a VPN to jump the GFW, they will ban you for life, Microsoft, IBM, Amazon, aren’t interested in having their whole cloud added to the GFW block list. Many people have tried this (including Microsfties in China with free Azure credits) and they’ve all been dealt with harshly by the cloud providers.
MangoToupe•7mo ago
Woah there Mr Money, slow down with these assumptions. A computer is worth the investment. But paying a cent extra to airlines? Unacceptable.
seanmcdirmid•7mo ago
The $3000 that a MBP M3 Max with 64GB of RAM costs might cover a round trip business class ticket for a trans pacific…if it is on sale (a Chinese carrier probably with GFW internet).
diggan•7mo ago
Some of us don't have the most reliable ISPs or even network infrastructure, and I say that as someone who lives in Spain :) I live outside a huge metropolitan area and Vodafone fiber went down twice this year, not even counting the time the country's electricity grid was down for like 24 hours.
PeterStuer•7mo ago
Same. For 'sovereignty ' reasons I eventually will move to local processing, but for now in development/prototyping the gap with hosted LLM's seems too wide.
diggan•7mo ago
There are some use cases I use LLMs for where I don't care a lot about the data being private (although that's a plus) but I don't want to pay XXX€ for classifying some data and I particularly don't want to worry about having to pay that again if I want to redo it with some changes.

Using local LLMs for this I don't worry about the price at all, I can leave it doing three tries per "task" without tripling the cost if I wanted to.

It's true that there is an upfront cost but way easier to get over that hump than on-demand/per-token costs, at least for me.

zackify•7mo ago
I love LM studio but I’d never waste 12k like that. The memory bandwidth is too low trust me.

Get the RTX Pro 6000 for 8.5k with double the bandwidth. It will be way better

marci•7mo ago
You can't run deepseek-v3/r1 on the RTX Pro 6000, not to mention the upcomming 1 million context qwen models, or the current qwen3-235b.
112233•7mo ago
I can run full deepseek r1 on m1 max with 64GB of ram. Around 0.5 t/s with small quant. Q4 quant of Maverick (253 GB) runs at 2.3 t/s on it (no GPU offload).

Practically, last gen or even ES/QS EPYC or Xeon (with AMX), enough RAM to fill all 8 or 12 channels plus fast storage (4 Gen5 NVMEs are almost 60 GB/s) on paper at least look like cheapest way to run these huge MoE models at hobbyist speeds.

marci•7mo ago
If you're talking about Deepseek r1 with llama.cpp and mmap, then at this point you can run deepseek r1 on a raspberry zero with a 256GB micro sdcard and a phone charger. The only metric left to know is one's patience.
tymscar•7mo ago
Why would they pay 2/3 of the price for something with 1/5 of ram?

The whole point of spending that much money for them is to run massive models, like the full R1, which the Pro 6000 cant

zackify•7mo ago
Because waiting forever for initial prompt processing with realistic number of MCP tools enabled on a prompt is going to suck without the most bandwidth possible

And you are never going to sit around waiting for anything larger than the 96+gb of ram that the RTX pro has.

If you’re using it for background tasks and not coding it’s a different story

johndough•7mo ago
If the MPC tools come first in the conversation, it should be technically possible to cache the activations, so you do not have to recompute them each time.
pests•7mo ago
Initial prompt processing with a large static context (system prompt + tools + whatever) could technically be improved by checkpointing the model state and reusing for future prompts. Not sure if any tools support this.
112233•7mo ago
Dropping in late into this discussion, but is there any way to "comfortably" use multiple precomputed kv-caches with current models, in the style of this work: https://arxiv.org/abs/2212.10947 ?

Meaning, I pre-parse multiple documents, and the prompt and completion attention sees all of them, but there is no attention between the documents (they are all encoded in the same overlapping positions).

This way you can include basically unlimited amount of data in the prompt, paying for it with the perfomance.

tucnak•7mo ago
https://docs.vllm.ai/projects/production-stack/en/latest/tut...
storus•7mo ago
M3 Ultra GPU is around 3070-3080 for the initial token processing. Not great, not terrible.
MangoToupe•7mo ago
> And you are never going to sit around waiting for anything larger than the 96+gb of ram that the RTX pro has.

Am I the only person that gives aider instructions and leaves it alone for a few hours? This doesn't seem that difficult to integrate into my workflow.

diggan•7mo ago
> Am I the only person that gives aider instructions and leaves it alone for a few hours?

Probably not, but in my experience, if it takes longer than 10-15 minutes it's either stuck in a loop or down the wrong rabbit hole. But I don't use it for vibe coding or anything "big scope" like that, but more focused changes/refactors so YMMV

chisleu•7mo ago
You are correct that inference speed per $ is not optimized with this purchase.

What is optimized is the ability to find tune medium size models (~200GB) / $

You just can't get 500GB of VRAM for less than $100k. Even with $9k Blackwell cards, you have $10k in a barebones GPU server. You can't use commodity hardware and cluster it because you need fast interconnects. I'm talking 200-400GB/s interconnects. And those take yet another PCIe slot and require expensive Infiniband switches.

Shit gets costly fast. I consternated about this purchase for weeks. Eventually deciding that it's the easiest path to success for my purposes. Not for everyone's, but for mine.

t1amat•7mo ago
(Replying to both siblings questioning this)

If the primary use case is input heavy, which is true of agentic tools, there’s a world where partial GPU offload with many channels of DDR5 system RAM leads to an overall better experience. A good GPU will process input many times faster, and with good RAM you might end up with decent output speed still. Seems like that would come in close to $12k?

And there would be no competition for models that do fit entirely inside that VRAM, for example Qwen3 32B.

storus•7mo ago
RTX Pro 6000 can't do DeepSeek R1 671B Q4, you'd need 5-6 of them, which makes it way more expensive. Moreover, MacStudio will do it at 150W whereas Pro 6000 would start at 1500W.
diggan•7mo ago
> Moreover, MacStudio will do it at 150W whereas Pro 6000 would start at 1500W.

No, Pro 6000 pulls max 600W, not sure where you get 1500W from, that's more than double the specification.

Besides, what is the token/second or second/token, and prompt processing speed for running DeepSeek R1 671B on a Mac Studio with Q4? Curious about those numbers, because I have a feeling they're very far off each other.

storus•7mo ago
You need at least 5x Pro 6000 (for smaller contexts), let's say Max-Q edition running at 300W, so overall you get a minimum of 1500W.

You get around 6 tokens/second which is not great but not terrible. If you use very long prompts, things get bad.

smcleod•7mo ago
RTX is nice, but it's memory limited and requires to have a full desktop machine to run it in. I'd take slower inference (as long as it's not less than 15tk/s) for more memory any day!
diggan•7mo ago
I'd love to see more Very-Large-Memory Mac Studio benchmarks for prompt processing and inference. The few benchmarks I've seem either missed to take prompt processing into account, didn't share exact weights+setup that were used or showed really abysmal performance.
chisleu•7mo ago
Oh I plan to produce a ton of that. I'll post a blog on it to HN and /r/localllama when I'm done.
chisleu•7mo ago
Only on HN can buying a $12k badass computer be a waste of money
storus•7mo ago
If the rumors about splitting CPU/GPU in new Macs are true, your MacStudio will be the last one capable of running DeepSeek R1 671B Q4. It looks like Apple had an accidental winner that will go away with the end of unified RAM.
phren0logy•7mo ago
I have not heard this rumor. Source?
prophesi•7mo ago
I believe they're talking about the rumors by an Apple supply chain analyst, Ming-Chi Kuo.

https://www.techspot.com/news/106159-apple-m5-silicon-rumore...

diggan•7mo ago
Seems Apple is waking up to the fact that if it's too easy to run weights locally, there really isn't much sense to having their own remote inference endpoints, so time to stop the party :)
prophesi•7mo ago
I thought their goal was to completely remove the need for a remote inference endpoint in the first place? May have read your comment wrong.
diggan•7mo ago
No, I think Apple been clear from the beginning that they won't be able to do everything on the devices themselves, that's why they're building the infrastructure/software for their "cloud intelligence system" or whatever they call it.
whatevsmate•7mo ago
I did this a month ago and don't regret it one bit. I had a long laundry list of ML "stuff" I wanted to play with or questions to answer. There's no world in which I'm paying by the request, or token, or whatever, for hacking on fun projects. Keeping an eye on the meter is the opposite of having fun and I have absolutely nowhere I can put a loud, hot GPU (that probably has "gamer" lighting no less) in my fam's small apartment.
chisleu•7mo ago
Right on. I also have a laundry list of ML things I want to do starting with fine tuning models.

I don't mind paying for models to do things like code. I like to move really fast when I'm coding. But for other things, I just didn't want to spend a week or two coming up on the hardware needed to build a GPU system. You can just order a big GPU box, but it's going to cost you astronomically right now. Building a system with 4-5 PCIE 5.0 x16 slots, enough power, enough pcie lanes... It's a lot to learn. You can't go on PC part picker and just hunt a motherboard with 6 double slots.

This is a machine to let me do some things with local models. My first goal is to run some quantized version of the new V3 model and try to use it for coding tasks.

I expect it will be slow for sure, but I just want to know what it's capable of.

datpuz•7mo ago
I genuinely cannot wrap my head around spending this much money on hardware that is dramatically inferior to hardware that costs half the price. MacOS is not even great anymore, they stopped improving their UX like a decade ago.
chisleu•7mo ago
How can you say something so brave, and so wrong?
minimaxir•7mo ago
LM Studio has quickly become the best way to run local LLMs on an Apple Silicon Mac: no offense to vllm/ollama and other terminal-based approaches, but LLMs have many levers for tweaking output and sometimes you need a UI to manage it. Now that LM Studio supports MLX models, it's one of the most efficient too.

I'm not bullish on MCP, but at the least this approach gives a good way to experiment with it for free.

nix0n•7mo ago
LM Studio is quite good on Windows with Nvidia RTX also.
boredemployee•7mo ago
care to elaborate? i have rtx 4070 12gb vram + 64gb ram, i wonder what models I can run with it. Anything useful?
nix0n•7mo ago
LM Studio's model search is pretty good at showing what models will fit in your VRAM.

For my 16gb of VRAM, those models do not include anything that's good at coding, even when I provide the API documents via PDF upload (another thing that LM Studio makes easy).

So, not really, but LM Studio at least makes it easier to find that out.

boredemployee•7mo ago
ok, ty for the reply!
Eupolemos•7mo ago
If you go to huggingface.co, you can tell it what specs you have and when you go to a model, it'll show you what variations of that model are likely to run well.

So if you go to this[0] random model, on the right there is a list of quantifications based on bits, and those you can run will be shown in green.

[0] https://huggingface.co/unsloth/Mistral-Small-3.1-24B-Instruc...

pzo•7mo ago
I just wish they did some facelifting of UI. Right now is too colorfull for me and many different shades of similar colors. I wish they copy some color pallet from google ai studio or from trae or pycharm.
chisleu•7mo ago
> I'm not bullish on MCP

You gotta help me out. What do you see holding it back?

minimaxir•7mo ago
tl;dr the current hype around it is a solution looking for a problem and at a high level, it's just a rebrand of the Tools paradigm.
mhast•7mo ago
It's "Tools as a service", so it's really trying to make tool calling easier to use.
ijk•7mo ago
Near as I can tell it's supposed to make calling other people's tools easier. But I don't want to spin up an entire server to invoke a calculator. So far it seems to make building my own local tools harder, unless there's some guidebook I'm missing.
xyc•7mo ago
It's a protocol that doesn't dictate how you are calling the tool. You can use in-memory transport without needing to spin up a server. Your tool can just be a function, but with the flexibility of serving to other clients.
ijk•7mo ago
Are there any examples of that? All the documentation I saw seemed to be about building an MCP server, with very little about connecting an existing inference infrastructure to local functions.
xyc•7mo ago
For TypeScript you can refer to https://github.com/modelcontextprotocol/typescript-sdk/blob/...

There isn't much documentation available right now but you can ask coding agent eg. Claude Code to generate an example.

cchance•7mo ago
Your not spinning up a whole server lol, most MCP's can be run locally, and talked to over stdio, like their just apps that the LLM can call, what they talk to or do is up to the MCP writer, its easier to have a MCP that communicates what it can do and handles the back and forth, than writing a non-standard middleware to handle say calls to an API or handle using applescript, or vmware or something else...
ijk•7mo ago
I wish the documentation was clearer on that point; I went looking through their site and didn't see any examples that weren't oversimplified REST API calls. I imagine they might have updated it since then, or I missed something.
zackify•7mo ago
Ollama doesn’t even have a way to customize the context size per model and persist it. LM studio does :)
Anaphylaxis•7mo ago
This isn't true. You can `ollama run {model}`, `/set parameter num_ctx {ctx}` and then `/save`. Recommended to `/save {model}:{ctx}` to persist on model update
truemotive•7mo ago
This can be done with custom Modelfiles as well, I was pretty bent when I found out that 2048 was the default context length.

https://ollama.readthedocs.io/en/modelfile/

zackify•7mo ago
As of 2 weeks back if I did this, it would reset back the moment cline made an api call. But lm studio would work correctly. I’ll have to try again. Even confirmed cline was not overriding num context
visiondude•7mo ago
LMStudio works surprisingly well on M3 Ultra 64gb and 27b models.

Nice to have a local option, especially for some prompts.

squanchingio•7mo ago
I'll be nice to have the MCP servers exposed like LMStudio OpenAI-like endpoints.
patates•7mo ago
What models are you using on LM Studio for what task and with how much memory?

I have a 48GB macbook pro and Gemma3 (one of the abliterated ones) fits my non-code use case perfectly (generating crime stories which the reader tries to guess the killer).

For code, I still call Google to use Gemini.

robbru•7mo ago
I've been using the Google Gemma QAT models in 4B, 12B, and 27B with LM Studio with my M1 Max. https://huggingface.co/lmstudio-community/gemma-3-12B-it-qat...
t1amat•7mo ago
I would recommend Qwen3 30B A3B for you. The MLX 4bit DWQ quants are fantastic.
redman25•7mo ago
Qwen is great but for creative writing I think Gemma is a good choice. It has better EQ than Qwen IMO.
api•7mo ago
I wish LM Studio had a pure daemon mode. It's better than ollama in a lot of ways but I'd rather be able to use BoltAI as the UI, as well as use it from Zed and VSCode and aider.

What I like about ollama is that it provides a self-hosted AI provider that can be used by a variety of things. LM Studio has that too, but you have to have the whole big chonky Electron UI running. Its UI is powerful but a lot less nice than e.g. BoltAI for casual use.

SparkyMcUnicorn•7mo ago
There's a "headless" checkbox in settings->developer
diggan•7mo ago
Still, you need to install and run the AppImage at least once to enable the "lms" cli which can later be used. Would be nice with a completely GUI-less installation/use method too.
t1amat•7mo ago
The UI is the product. If you just want the engine, use mlx-omni-server (for MLX) or llama-swap (for GGUF) and huggingface-cli (for model downloads).
diggan•7mo ago
Those don't offer the same features as LM Studio itself does, even when you don't consider the UI. If there was a "LM Engine" CLI I could install, then yeah, but there isn't, hence the need to run the UI once to get "the engine".
SparkyMcUnicorn•7mo ago
I haven't used this and maybe it doesn't solve the problem you're describing, but might be worth looking at.

https://github.com/lmstudio-ai/lms

https://lmstudio.ai/docs/cli

rhet0rica•7mo ago
Oh, that horrible Electron UI. Under Windows it pegs a core on my CPU at all times!

If you're just working as a single user via the OpenAI protocol, you might want to consider koboldcpp. It bundles a GUI launcher, then starts in text-only mode. You can also tell it to just run a saved configuration, bypassing the GUI; I've successfully run it as a system service on Windows using nssm.

https://github.com/LostRuins/koboldcpp/releases

Though there are a lot of roleplay-centric gimmicks in its feature set, its context-shifting feature is singular. It caches the intermediate state used by your last query, extending it to build the next one. As a result you save on generation time with large contexts, and also any conversation that has been pushed out of the context window still indirectly influences the current exchange.

diggan•7mo ago
> Oh, that horrible Electron UI. Under Windows it pegs a core on my CPU at all times!

Worse I'd say, considering what people use LM Studio for, is the VRAM it occupies up even when the UI and everything is idle. Somehow, it's using 500MB VRAM while doing nothing, while Firefox with ~60 active tabs is using 480MB. gnome-shell itself also sits around 450MB and is responsible for quite a bit more than LM Studio.

Still, LM Studio is probably the best all-in-one GUI around for local LLM usage, unless you go terminal usage.

politelemon•7mo ago
The initial experience with LMStudio and MCP doesn't seem to be great, I think their docs could do with a happy path demo for newcomers.

Upon installing the first model offered is google/gemma-3-12b - which in fairness is pretty decent compared to others.

It's not obvious how to show the right sidebar they're talking about, it's the flask icon which turns into a collapse icon when you click it.

I set the MCP up with playwright, asked it to read the top headline from HN and it got stuck on an infinite loop of navigating to Hacker News, but doing nothing with the output.

I wanted to try it out with a few other models, but figuring out how to download new models isn't obvious either, it turned out to be the search icon. Anyway other models didn't fare much better either, some outright ignored the tools despite having the capacity for 'tool use'.

t1amat•7mo ago
Gemma3 models can follow instructions but were not trained to call tools, which is the backbone of MCP support. You would likely have a better experience with models from the Qwen3 family.
cchance•7mo ago
That latter issue isnt a lmstudio issue... its a model issue,
Thews•7mo ago
Others mentioned qwen3, but which works fine with HN stories for me, but the comments still trip it up and it'll start thinking the comments are part of the original question after a while.

I also tried the recent deepseek 8b distill, but it was much worse for tool calling than qwen3 8b.

maxcomperatore•7mo ago
good.
v3ss0n•7mo ago
Closed source - wont touch.
xyc•7mo ago
Great to see more local AI tools supporting MCP! Recently I've also added MCP support to recurse.chat. When running locally (LLaMA.cpp and Ollama) it still needs to catch up in terms of tool calling capabilities (for example tool call accuracy / parallel tool calls) compared to the well known providers but it's starting to get pretty usable.
rshemet•7mo ago
hey! we're building Cactus (https://github.com/cactus-compute), effectively Ollama for smartphones.

I'd love to learn more about your MCP implementation. Wanna chat?

zaps•7mo ago
Not to be confused with FL Studio
bbno4•7mo ago
Is there an app that uses OpenRouter / Claude or something locally but has MCP support?
eajr•7mo ago
I've been considering building this. Havent found anything yet.
cchance•7mo ago
vscode with roocode... just use the chat window :S
cedws•7mo ago
I’m looking for something like this too. Msty is my favourite LLM UI (supports remote + local models) but unfortunately has no MCP support. It looks like they’re trying to nudge people into their web SaaS offering which I have no interest in.
jtreminio•7mo ago
I’ve been wanting to try LM Studio but I can’t figure out how to use it over local network. My desktop in the living room has the beefy GPU, but I want to use LM Studio from my laptop in bed.

Any suggestions?

skygazer•7mo ago
Use an openai compatible API client on your laptop, and LM Studio on your server, and point the client to your server. LM Server can serve an LLM on a desired port using the openai style chat completion API. You can also install openwebui on your server and connect to it via a web browser, and configure it to use the LM Studio connection for its LLM.
numpad0•7mo ago

  [>_] -> [.* Settings] -> Serve on local network ( o)
Any OpenAI-compatible client app should work - use IP address of host machine as API server address. API key can be bogus or blank.
sixhobbits•7mo ago
MCP terminology is already super confusing, but this seems to just introduce "MCP Host" randomly in a way that makes no sense to me at all.

> "MCP Host": applications (like LM Studio or Claude Desktop) that can connect to MCP servers, and make their resources available to models.

I think everyone else is calling this an "MCP Client", so I'm not sure why they would want to call themselves a host - makes it sound like they are hosting MCP servers (definitely something that people are doing, even though often the server is run on the same machine as the client), when in fact they are just a client? Or am I confused?

guywhocodes•7mo ago
MCP Host is terminology from the spec. It's the software that makes llm calls, build prompts, interprets tool call requests and performs them etc.
sixhobbits•7mo ago
So it is, I stand corrected. I googled mcp host and the lmstudio link was the first result.

Some more discussion on the confusion here https://github.com/modelcontextprotocol/modelcontextprotocol... where they acknowledge that most people call it a client and that that's ok unless the distinction is important.

I think host is a bad term for it though as it makes more intuitive sense for the host to host the server and the client to connect to it, especially for remote MCP servers which are probably going to become the default way of using them.

kreetx•7mo ago
I'm with you on the confusion, it makes no sense at all to call it a host. MCP host should host the MCP server (yes, I know - that is yet a separate term).

The MCP standard seems a mess, e.g take this paragraph from here[1]

> In the Streamable HTTP transport, the server operates as an independent process that can handle multiple client connections.

Yes, obviously, that is what servers do. Also, what is "Streamable HTTP"? Comet, HTTP2, or even websockets? SSE could be a candidate, but it isn't as it says "Streamable HTTP" replaces SSE.

> This transport uses HTTP POST and GET requests.

Guys, POST and GET are verbs for HTTP protocol, TCP is the transport. I guess they could say that they use HTTP protocol, which only uses POST and GET verbs (if that is the case).

> Server can optionally make use of Server-Sent Events (SSE) to stream multiple server messages.

This would make sense if there weren't the note "This replaces the HTTP+SSE transport" right below the title.

> This permits basic MCP servers, as well as more feature-rich servers supporting streaming and server-to-client notifications and requests.

Again, how is streaming implemented (what is "Streaming HTTP")?. Also, "server-to-client .. requests"? SSE is unidirectional, so those requests are happening over secondary HTTP requests?

--

And then the 2.0.1 Security Warning seems like a blob of words on security, no reference to maybe same-origin. Also, "for local servers bind to localhost and then implement proper authentication" - are both of those together ever required? Is it worth it to even say that servers should implement proper authentication?

Anyway, reading the entire documentation one might be able to put a charitable version of the MCP puzzle together that might actually make sense. But it does seem that it isn't written by engineers, in which case I don't understand why or to whom is this written for.

[1] https://modelcontextprotocol.io/specification/draft/basic/tr...

diggan•7mo ago
> But it does seem that it isn't written by engineers

As far as I can tell, unsurprisingly, the MCP specification was written with the help of LLMs, and seemingly hasn't been carefully reviewed because as you say, a bunch of the terms have straight up wrong definitions.

kreetx•7mo ago
Using LLMs is entirely fine, but poor review for a protocol definition is ..degenerate. Aren't protocols supposed to be precise?
remram•7mo ago
It was written by one vendor for their own use. It is miles away from an RFC or "standard"
diggan•7mo ago
Regardless if it's a RFC, standard or whatever, protocols need to be precise, exact and correct. And I think they wrote MCP with the idea of others using it, otherwise why even make it public if it's just for their own usage?
qntty•7mo ago
It's confusing but you just have to read the official docs

https://modelcontextprotocol.io/specification/2025-03-26/arc...

mkagenius•7mo ago
On M1/M2/M3 Mac, you can use Apple Containers to automate[1] the execution of the generated code.

I have one running locally with this config:

    {
      "mcpServers": {
        "coderunner": {
          "url": "http://coderunner.local:8222/sse"
        }
      }
    }

1. CodeRunner: https://github.com/BandarLabs/coderunner (I am one of the authors)
smcleod•7mo ago
I really like LM Studio but their license / terms of use are very hostile. You're in breach if you use it for anything work related - so just be careful folks!
jmetrikat•7mo ago
great! it's very convenient to try mcp servers with local models that way.

just added the `Add to LM Studio` button to the anytype mcp server, looks nice: https://github.com/anyproto/anytype-mcp

b0dhimind•7mo ago
I wonder how LM Studio and AnythingLLM contrasts especially in upcoming months... I like AnythingLLM's workflow editor. I'd like something to grow into for my doc-heavy job. Don't want to be installing and trying both.