LM Studio 0.4

https://lmstudio.ai/blog/0.4.0

238•jiqiren•1w ago

Comments

jiqiren•1w ago

This release introduces parallel requests with continuous batching for high throughput serving, all-new non-GUI deployment option, new stateful REST API, and a refreshed user interface.

observationist•1w ago

Awesome - having the API, MCP integrations, refined CLI give you everything you might want. I have some things I'd wanted to try with ChainForge and LMStudio that are now almost trivial.

Thanks for the updates!

nubg•1w ago

are parallel requests "free"? or do you half performance when sending two requests in parallel?

anon373839•1w ago

I have seen ~1,300 tokens/sec of total throughout with Llama 3 8B on a MacBook Pro. So no, you don’t halve the performance. But running batched inference takes more memory, so you have to use shorter contexts than if you weren’t batching.

minimaxir•1w ago

LMStudio introducing a command line interface makes things come full circle.

Helithumper•1w ago

For context, LMStudio has had a CLI for a while it just required the desktop app to be open already. This makes it where you can run LMStudio properly headless and not just from a terminal while the desktop app is open.

`lms chat` has existed, `lms daemon up` / "llmster" is the new command.

embedding-shape•1w ago

> This makes it where you can run LMStudio properly headless and not just from a terminal while the desktop app is open

Ah, this is great, been waiting for this! I naively created some tooling on top of the API from the desktop app after seeing they had a CLI, then once I wanted to deploy and run it on a server, I got very confused that the desktop app actually installs the CLI and it requires the desktop app running.

Great that they finally got it working fully headless now :)

syntaxing•1w ago

I’m really excited for lmster and to try it out. It’s essentially what I want from ollama. Ollama has deviated so much from their original core principles. Ollama has been broken and slow to update model support. There’s this “vendor sync” I’ve been waiting (essentially update ggml) for weeks.

PlatoIsADisease•1w ago

What was the original core principle of ollama?

I had used oobabooga back in the day and found ollama unnecessary.

fud101•1w ago

>What was the original core principle of ollama?

Nothing, it was always going to be a rug pull. They leached off llama.cpp.

garyfirestorm•1w ago

Everyone seems to be missing important piece here. Ollama is/was a one click solution for non technical person to launch a local model. It doesn’t need a lot of configuration, detects Nvidia GPU and starts model inferencing with single command. Core principle being your grandmother should be able to launch local AI model without needing to install 100 dependencies.

stuaxo•1w ago

Exactly.

I can be in a non-technical team, and put the LLM code inside docker.

The local dev instruction is to install ollama and use it to pull the models and set some env vars.

The same code can point at bedrock when deployed there.

Using straight llamacpp at the time I wrote that it wasn't as straightforward.

embedding-shape•1w ago

For fun, this is how an actual "non-technical" individual would hear/read your comment:

> Exactly. I can be in a non-technical team, and put the blah inside blah. The blah is to install blah and use it to blah and blah. The same blah can point at blah when blah there. Using blah at the time I wrote that it wasn't as straightforward.

I think when people say "non-technical", it feels like they're talking about "People who work in tech startups, but aren't developers" instead of actually people who aren't technical one bit, the ones who don't know the difference between "desktop" and a "browser" for example. Where you tell them to press any key, and they replied with "What key is that?".

embedding-shape•1w ago

> Ollama is/was a one click solution for non technical person to launch a local model

Maybe it is today, but initially ollama was only a cli, so obviously not for "non technical people" who would have no idea how to even use a terminal. If you hang out in the Ollama Discord (unlikely, as the mods are very ban-happy), you'd see constantly people asking for very trivial help, like how to enter commands in the terminal, and the community stringing them along, instead of just directing them to LM Desktop or something that would be much better for that type of user.

embedding-shape•1w ago

> What was the original core principle of ollama?

One decision that was/is very integral to their architecture is trying to copy how Docker handled registries and storage of blobs. Docker images have layers, so the registry could store one layer that is reused across multiple images, as one example.

Ollama did this too, but I'm unsure of why. I know the author used to work at Docker, but almost no data from weights can be shared in that way, so instead of just storing "$model-name.safetensor/.gguf" on disk, Ollama splits it up into blobs, has it's own index, and so on. For seemingly no gain except making it impossible to share weights between multiple applications.

I guess business-wise, it was easier for them to now make people use their "cloud models" so they earn money, because it's just another registry the local client connects to. But also means Ollama isn't just about running local models anymore, because that doesn't make them money, so all their focus now is on their cloud instead.

At least as a LM Studio, llama.cpp and vLLM user, I can have one directory with weights shared between all of them (granted the format of the weight works in all of them), and if I want to use Ollama, it of course can't use that same directory and will by default store things it's own way.

plagiarist•1w ago

I was looking into what local inference software to use and also found this behavior with models to be onerous.

What I want is to have a directory with models and bind mount that readonly into inference containers. But Ollama would force me to either prime the pump by importing with Modelfiles (where do I even get these?) every time I start the container, or store their specific version of files?

I had trying out vLLM and llama.cpp as my next step in this, I'm glad to hear you are able to share a directory between them.

embedding-shape•1w ago

> What I want is to have a directory with models and bind mount that readonly into inference containers.

Yeah, that's basically what I'm doing, + over network (via Samba). My weights all live on a separate host, which has two Samba shares, one with write access and one read-only. The write one is mounted on my host, and the container where I run the agent mounts the read-only one (and have the source code it works on copied over to the container on boot).

The directory that LM Studio ends up creating and maintaining for the weights, works with most of the tooling I come across, except of course Ollama.

d0mine•1w ago

Ollama vs. llama.cpp is like Docker vs. FreeBSD Jails, Dropbox vs. rsync, jujutsu vs git, etc

Imustaskforhelp•1w ago

LMStudio is great but its still not open source. I wish something better than Ollama can be created honestly similar to LMStudio (atleast its new CLI Part from what I can tell) and create an open source alternative.

I think I am fairly technical but I still prefer how Ollama is simple but I know all the complaints about Ollama and I am really just wishing for a better alternative for the most part.

Maybe just a direct layer on top of vllm or llama.cpp itself?

embedding-shape•1w ago

> Maybe just a direct layer on top of vllm

My dream would be something like vLLM, but without all the Python mess, packaged as a single binary that has both HTTP server + desktop GUI, and can browse/download models. Llama.cpp is like 70% there, but large performance difference between llama.cpp and vLLM for the models I use.

Imustaskforhelp•1w ago

> My dream would be something like vLLM, but without all the Python mess, packaged as a single binary that has both HTTP server + desktop GUI, and can browse/download models. Llama.cpp is like 70% there, but large performance difference between llama.cpp and vLLM for the models I use.

To be honest, I was seeing your comment multiple times and after 6 hours, It suddenly clicked about something new.

I had seen this project on reddit once, https://github.com/GeeeekExplorer/nano-vllm

It's almost as fast (from what I can tell in its readme, faster?) than vllm itself but unfortunately its written in python too.

But the good news is that its much smaller in the whole size of the codebase. Let me paste somethings from its readme

     Fast offline inference - Comparable inference speeds to vLLM
     Readable codebase - Clean implementation in ~ 1,200 lines of Python code
     Optimization Suite - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc.

Inference Engine Output Tokens Time (s) Throughput (tokens/s) vLLM 133,966 98.37 1361.84 Nano-vLLM 133,966 93.41 1434.13

So I guess I am pretty sure that you can one-agent-one-human it from python to rust/golang! It can be an open project.

Also speaking of oaoh (as I have started calling it), a bit offtopic but my golang port faces multiple issues as I tried today to make it work. I do feel like rust was a good lang because quite frankly the AI agent or anything instead of wanting to do things with its own hands, really wants to end up wanting/wishing to use Fyne library & the best success I had around going against Fyne was in kimi's computer use where you can say that I got a very very (like only simple text) nothing else png file-esque thing working

If you are interesting emsh. I am quite frankly interested that given that your oaoh project is really high quality. Does it still require the intervention of human itself or can an AI port it itself. Because I have mixed feelings about it.

Honestly It's an open challenge to everybody. I am just really interested in getting to learn something about how LLM's work and some lesson from this whole thing I guess imo.

Still trying to create the golang port as we speak haha xD.

azharav•6d ago

Sh N E Z A R Sh0997585 699

saberience•1w ago

What’s the main use-case for this?

I get that I can run local models, but all the paid for (remote) models are superior.

So is the use-case just for people who don’t want to use big tech’s models? Is this just for privacy conscious people? Or is this just for “adult” chats, ie porn bots?

Not being cynical here, just wanting to understand the genuine reasons people are using it.

tiderpenger•1w ago

To justify investing a trillion dollars like everything else LLM-related. The local models are pretty good. Like I ran a test on R1 (the smallest version) vs Perplexity Pro and shockingly got better answers running on base spec Mac Mini M4. It's simply not true that there is a huge difference. Mostly it's hardcoded overoptimalization. In general these models aren't really becoming better.

mk89•1w ago

I agree with this comment here.

For me the main BIG deal is that cloud models have online search embedded etc, while this one doesn't.

However, if you don't need that (e.g., translate, summarize text, writing code) probably is good enough.

prophesi•1w ago

So long as the local model supports tool-use, I haven't had issues with them using web search etc in open-webui. Frontier models will just be smarter in knowing when to use tools.

mk89•1w ago

Ok I need to explore this, I didn't do it yet. Thanks.

nunodonato•1w ago

you can do web searches in lm studio. just connect an mcp that does it. Serpapi has an mcp, for example

mark_l_watson•1w ago

Also, I had several experiments where I was interested in just 5 to 10 websites with application specific information so it works nicely for fast dev to spider, keep a local index, then get very low search latency. Obviously this is not a general solution but is nice for some use cases.

dragonwriter•1w ago

> For me the main BIG deal is that cloud models have online search embedded etc, while this one doesn't.

Models do not have online search embedded, they have tool use capabilities (possibly with specialized training for a web search tool), but that's true of many open and weights-available models, and they are run with harnesses that support tools and provide a web search tool (lmstudio is such a harness, and can easily be supplied with a web search tool.)

reactordev•1w ago

Not always. Besides, this allows one to use a post-trained model, a heretic model, an abliterated model, or their own.

I exclusively run local models. On par with Opus 4.5 for most things. gpt-oss is pretty capable. Qwen3 as well.

nubg•1w ago

> On par with Opus 4.5 for most things

Are you asking it for capital cities or what?

reactordev•1w ago

No…

I’m asking it to write C code

biddit•1w ago

Yes, frontier models from the labs are a step ahead and likely will always be, but we've already crossed levels of "good enough for X" with local models. This is analogous to the fact that my iPhone 17 is technically superior to my iPhone 8, but my outcomes for text messaging are no better.

I've invested heavily in local inference. For me, it's a mixture privacy, control, stability, cognitive security.

Privacy - my agents can work on tax docs, personal letters, etc.

Control - I do inference steering with some projects: constraining which token can be generated next at any point in time. Not possible with API endpoints.

Stability - I had many bad experiences with frontier labs' inference quality shifting within the same day, likely due to quantization due to system load. Worse, they retire models, update their own system prompts, etc. They're not stable.

Cognitive Security - This has become more important as I rely more on my agents for performing administrative work. This is intermixed with the Control/Stability concerns, but the focus is on whether I can trust it to do what I intended it to do, and that it's acting on my instructions, rather than the labs'.

metalliqaz•1w ago

I just "invested heavily" (relatively modest, but heavy for me) in a PC for local inference. The RAM was painful. Anyway, for my focused programming tasks the 30B models are plenty good enough.

samarthr1•1w ago

I am extremely fortunate having bought 64GB of CL30 DDR5 Ram for ~200 USD just 4 months ago!

My computer is now worth more than when I bought it

metalliqaz•4d ago

ugh. $650 for 64GB DDR5-6000, CL36

anonym29•1w ago

TL;DR: The classic CIA triad: Confidentiality, Integrity, Availability; cost/price concerns; the leading open-weight models aren't nearly as bad as you might think.

You don't need LM Studio to run local models, it just (was, formerly), a nice UI to download and manage HF models and llama.cpp updates, quickly and easily manually switch between CPU / Vulkan / ROCm / CUDA (depending on your platform).

Regarding your actual question, there are several reasons.

First off, your allusion to privacy - absolutely, yes, some people use it for adult role-play, however, consider the more productive motivations for privacy, too: a lot of businesses with trade secrets they may want to discuss or work on with local models without ever releasing that information to cloud providers, no matter how much those cloud providers pinky promise to never peek at it. Google, Microsoft, Meta, et al have consistently demonstrated that they do not value or respect customer privacy expectations, that they will eagerly comply with illegal, unconstitutional NSA conspiracies to facilitate bulk collection of customer information / data. There is no reason to believe Anthropic, OpenAI, Google, xAI would act any differently today. In fact, there is already a standing court order forcing OpenAI to preserve all customer communications, in a format that can be delivered to the court (i.e. plaintext, or encryption at rest + willing to provide decryption keys to the court), in perpetuity (https://techstartups.com/2025/06/06/court-orders-openai-to-p...)

There are also businesses which have strict, absolute needs for 24/7 availability and low latency, which remote APIs never have offered. Even if the remote APIs were flawless, and even if the businesses have a robust multi-WAN setup with redundant UPS systems, network downtime or even routing issues are more or less an inevitable fact of life, sooner or later. Having local models means you have inference capability as long as you have electricity.

Consider, too, the integrity front: frontier labs may silently modify API-served models to be lower quality for heavy users with little means of detection by end users (multiple labs have been suspected / accused of this; a lack of proof isn't evidence that it didn't happen) or that the API-served models can be modified over time to patch behaviors that may have been previously relied upon for legitimate workloads (imagine a red team that used a jailbreak to get a model to produce code for process hollowing, for instance). This second example absolutely has happened with almost every inference provider.

The open weight local models also have zero marginal cost besides electricity once the hardware is present, unlike PAYG API models, which create financial lock-in and dependency that is in direct contrast with the financial interests of the customers. You can argue about the amortized costs of hardware, but that's a decision for the customer to make using their specific and personal financial and capex / hardware information that you don't have at the end of the day.

Further, the gap between frontier open weight models and frontier proprietary models has been rapidly shrinking and continues to. See Kimi K2.5, Xiaomi MiMo v2, GLM 4.7, etc. Yes, Opus 4.5, Gemini 3 Pro, GPT-5.2-xhigh are remarkably good models and may beat these at the margin, but most work done via LLMs does not need the absolute best model; many people will opt for a model that gets 95% of the output quality of the absolute frontier model when it can be had for 1/20th the cost (or less).

konart•1w ago

For many tasks you don't really need big models. And relatively small model, quantized too can be run on your macbook (not to mention Mac studio).

hickelpickle•1w ago

I've gotten interested in local models recently after trying the here and there for years. We've finally hit the point where small <24GB models are capable of pretty amazing things. One use I have is I have a scraped forum database, and with a 20gb devstral model I was able to get it to select a bunch of random posts related to a species of exotic plants in batches of 5-10 up to n, summarize them into and intern sqllite table, then at the end go through read the interim summarization and write a final document addressing 5 different topics related to users experience growing the species.

Thats what convinced me they are ready to do real work, are they going to replace claude code...not currently. But it is insane to me that such a small model can follow those explicit directions and consistently perform that workflow.

I've during that experimentation, even when not putting the sql explicit it was able to craft the queries on its own from just text description, and has no issue navigating the cli and file system doing basic day to day things.

I'm sure there are a lot of people doing "adult" things, but my interest is sparked because they finally at the level they can be a tool in a homelab, and no longer is llm usage limits subsidized like they used to be. Not to mention I am really disillusioned with big tech having my data or exposing a tool making API calls to them that then can make actions on my system.

I'll still keep using claude code day to day coding. But for small system based tasks I plan on moving to local llms. Their capabilities have inspired me to write my own agentic framework to see what work flows can be put together for just management and automation of day to day task. Ideally it would be nice to just chat with an llm and tell it to add an appointment or call at x time or make sure I do it that day and it can read my schedule and remind-me at a chill time of my day to make the call, and then check up that I followed through. I also plan on seeing if I can also set it up to remind me and help to practice mindfulness and just general stress management I should do. While sure a simple reminder might work, but as someone with adhd who easily forgets reminders as soon as they pop up if I can get to them now, being pestered by an agent that wakes up and engages with me seems like it might be an interesting workflow.

And the hacker aspect, now that they are capable I really want to mess around with persistent knowledge in databases and making them intercommunicate and work together. Might even give them access to rewrite themselves and access the application during run time with a lisp. But to me local llms have gotten to the point they are fun and not annoying. I can run a model that is better than chatgpt 3.5 for the most part, its knowledge is more distilled and narrower, but for what they do understand their correctness is much better.

PlatoIsADisease•1w ago

I originally used local models as a somewhat therapeutic/advice thing. I didn't want to give openAI all my dirt.

But then I decided I'm just a chemical reaction and a product of my environment, so I gave chatGPT all my dirt anyway.

But before, I cared about my privacy.

anon373839•1w ago

> But then I decided I'm just a chemical reaction

That doesn’t address the practical significance of privacy, though. The real risk isn’t that OpenAI employees will read your chats for personal amusement. The risk is that OpenAI will exploit the secrets you’ve entrusted to them, to manipulate you, or to enable others to manipulate you.

The more information an unscrupulous actor has about you, the more damage they can do.

dragonwriter•1w ago

> What’s the main use-case for this?

Running weights available models.

> I get that I can run local models, but all the paid for (remote) models are superior.

If that's clearly true for your use cases, then maybe this isn’t for you.

> So is the use-case just for people who don’t want to use big tech’s models?

Most weights available models are also “big tech’s”, or finetunes of them.

> Is this just for privacy conscious people? Or is this just for “adult” chats, ie porn bots?

Sure, those are among the use cases. And there can be very good reasons to be concerned about privacy in some applications. But they aren’t the only reasons.

There’s a diversity of weights-available models available, with a variety of specialized strengths. Sure, for general use, the big commercial models may generally be more capable, but they may not be optimal for all uses (especially when cost effectiveness is considered, given that capable weights-available models for some uses are very lightweight.)

marak830•1w ago

I run a separate memory layer between my local and my chat.

Without a ton of hassle I cannot do that with a public model(without paying API pricing).

My responses may be slower, but I know the historical context is going to be there. As well as the model overrides.

In addition I can bolt on modules as I feel like it(voice, avatar, silly tavern to list a few).

I get to control my model by selecting specific ones for tasks, I can upgrade as they are released.

These are the reasons I use local.

I do use Claude for a coding junior so I can assign tasks and review it, purely because I do not have something that can replicate that locally on my setup(hardware wise, but from what I have read local coding models are not matching Claude yet)

That's more than likely a temporary issue(years not weeks with the expensive of things and state of open models specialising in coding).

maxkfranz•1w ago

Yeah, it’s not going to compare to Codex-5.2 or Opus 4.5.

Some non-programming use cases are interesting though, e.g. text to speech or speech to text.

Run a TTS model overnight on a book, and in the morning you’ll get an audiobook. With a simple approach, you’d get something more like the old books on tape (e.g. no chapter skipping), but regardless, it’s a valid use case.

JayDustheadz•1w ago

Which TTS would you suggest? Anything out there that is able to properly see/handle modulation, punctuation and overall sentence 'mood'? I've been looking for something easy to set up but most is either extremely complex or is producing output of relatively poor quality.

maxkfranz•5d ago

I’m still experimenting with them. I suspect you may have to do only one paragraph at a time and concatenate them together. Let me know if you’d be interested in collaborating, as I’m interested in this use case too.

gostsamo•1w ago

currently working on a personal project where part of the pipeline is recognizing lots of images. the employer let me use gemini for personal use, but wasting large amount of tokens on gemini3 pro ocr limited my work. flash gives worse result, but there are ways to retry. good for development, but long term, simpler parts of a pipeline could be dedicated to a local model. I can imagine many other use cases where you want large volume of low difficulty tasks at close to zero cost.

nxobject•1w ago

There are some surprisingly useful "small" use cases for general-purpose LLMs that don't necessarily require broad knowledge – image transcription plus some light post-processing is one I use a lot.

PeterStuer•1w ago

For some projects, you do not want your code or documents leaving the LAN. Many companies have explicit constraints on using external SaaS. It does not mean they restrict to everything 'on prem'. 'Self hosted' can include running an open weights model on multiple rented B200's.

So yes, the tradeoff is security vs capability. The former always comes at a cost.

numpad0•1w ago

Reports of people getting hit by twitchy fingered banbots on cloud LLMs are starting to show up(Gemini bans apparently kill Gmail and GDrive too). Paranoid types like I am appreciate local options that won't get me banned.

anonym29•1w ago

edit: disregard, new version did not respect old version's developer mode setting

nunodonato•1w ago

woah dude, take it easy. There are no missing features, there are more feature. You might just not be finding them where they were before. Remember this is still 0.x, why would the devs be stuck and not be able to improve the UI just because of past decisions?

anonym29•1w ago

edit: disregard, new version did not respect old version's developer mode setting

ffftttfffttt•1w ago

Go to settings developer and enable developer mode

webdevver•1w ago

the reason he (probably) wants that feature so badly is cos it crashes his amdgpu driver when he tries inferencing lol

although, as an amd user, he should know that both vulkan and rocm backends have equal propensity to crap the bed...

thousand_nights•1w ago

man they really butchered the user interface, the "dark" mode now isn't even dark, it's just grey, and it's looking more like a whitespacemaxxed children's toy than a tool for professionals

konart•1w ago

Right now it looks like as VS Code (give or take). Pretty sure both are\will be used by many professionals.

"looks like a toy" has very little to do with its use anyway.

keyle•1w ago

Yeah the theming options are lacking and I could never hack one up to work.

ekianjo•1w ago

yeah it looks worse than before

huydotnet•1w ago

I was hoping for the /v1/messages endpoint to use with Claude Code without any extra proxies :(

anonym29•1w ago

This is a breeze to do with llama.cpp, which has had Anthropic responses API support for over a month now.

On your inference machine:

  you@yourbox:~/Downloads/llama.cpp/bin$ ./llama-server -m <path/to/your/model.gguf> --alias <your-alias> --jinja --ctx-size 32768 --host 0.0.0.0 --port 8080 -fa on

Obviously, feel free to change your port, context size, flash attention, other params, etc.

Then, on the system you're running Claude Code on:

  export ANTHROPIC_BASE_URL=http://<ip-of-your-inference-system>:<port>
  export ANTHROPIC_AUTH_TOKEN="whatever"
  export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
  claude --model <your-alias> [optionally: --system "your system prompt here"]

Note that the auth token can be whatever value you want, but it does need to be set, otherwise a fresh CC install will still prompt you to login / auth with Anthropic or Vertex/Azure/whatever.

huydotnet•1w ago

yup, I've been using llama.cpp for that on my PC, but on my Mac I found some cases where MLX models work best. haven't tried MLX with llama.cpp, so not sure how that will work out (or if it's even supported yet).

huydotnet•1w ago

Well, to whoever downvoted my comment: It's supported now!!!! https://lmstudio.ai/blog/claudecode

behnamoh•1w ago

lmster is what was lacking in lmstudio (yes, they have lms but it lacks so many functionalities that the GUI version has).

but it's a bit too little too late. people running this probably can already setup llama.cpp pretty easily.

lmstudio also has some overhead like ollama; llama.cpp or mlx alone are always faster.

khimaros•1w ago

this is not open source

adastra22•1w ago

What’s the best open source alternative?

jckahn•1w ago

Jan: https://www.jan.ai/

khimaros•1w ago

llama.cpp

atwrk•1w ago

To add a few more details: llama.ccp now both has a web ui out of the box that even supports model switching, and easy model file downloads from huggingface using the cli: '-hf name_of_model:the_quant_you_want'.

PeterStuer•1w ago

LibreChat with vLLM?

https://www.librechat.ai/docs/configuration/librechat_yaml/a...

echelon•1w ago

They have an extensive GitHub full of stuff. What portions are not open source?

Is this like "OpenRouter" where they don't have any of the core product actually available?

tildef•1w ago

It seems their main app is proprietary. See https://lmstudio.ai/app-terms#restrictions-on-use. Example excerpt--though many of the other points in the ToS also makes it very un-FLOSS:

>> You agree that You will not permit any third party to, and You will not itself:[..] (e) reverse engineer, decompile, disassemble, or otherwise attempt to derive the source code for the Software[..]

ssalka•1w ago

Personally, I would not run LM Studio anywhere outside of my local network as it still doesn't support adding an SSL cert. I guess you can just layer a proxy server on top of it, but if it's meant to be easy to set up, it seems like a quick win that I don't see any reason not to build support for.

https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/1...

jermaustin1•1w ago

Because adding a caddy/nginx/apache + letsencrypt is a couple of bash commands between install and setup, and those http servers + TLS termination is going to be 100x better than what LMS adds themselves, as it isn't their core competency.

fragmede•1w ago

so have LMS bundle caddy?

dmd•1w ago

Adding Caddy as a proxy server is literally one line in Caddyfile, and I trust Caddy to do it right once more than I trust every other random project to add SSL.

makeramen•1w ago

Tailscale serve

Nijikokun•1w ago

thats why i use caddy or ngrok.ai

sfifs•1w ago

If you're running your apps on Kubernetes, standard ingress supports certs. For small applications, Cloudflare TLS on free tier is dead simple

maxkfranz•1w ago

Cloudflare tunnels makes this easy as it gets. It also makes it easy for only you to have access to it, either through sign in or OTPs.

You don’t want some random person to find your LMStudio service and then point their Opencode at it.

whalesalad•1w ago

tbh I would prefer that an application not do this, and allow me the choice and control of putting a proxy in front of it.

analog could be car infotainment systems: don't give me your half-baked shitty infotainment, i have carplay, let me use it.

chocobaby15•1w ago

When are you guys going to offer cloud inference as well?

embedding-shape•1w ago

Hopefully never, I hope they continue focusing on what they're good at, rather than starting the enshittification process this early. Not sure why Ollama is running towards that, maybe their runway is already shorter than expected?

ai_critic•1w ago

What exactly is the difference between lms and llmsterm?

alasr•1w ago

> What exactly is the difference between lms and llmsterm?

With lms, LM Studio's frontend GUI/desktop application and its backend LLM API server (for OpenAI compatibility API endpoints) are tightly coupled: stopping LM Studio's GUI/desktop application will trigger stopping of LM Studio's backend LLM API server.

With llmsterm, they've been decoupled now; it (llmsterm) enables one, as LM Studio announcement says, to "deploy on servers, deploy in CI, deploy anywhere" (where having a GUI/desktop application doesn't make sense).

ai_critic•1w ago

But like, llmsterm still results in using the `lms` command, right? Or am I misreading the docs?

alasr•1w ago

I think you're reading the docs correct: one still uses "lms server [command]" command to manage an LM Studio (LMS) server.

doanbactam•1w ago

I've been using Ollama for local dev, but the model management here seems easier to use. The new UI looks much cleaner than the previous versions. Has anyone benchmarked the server mode against Ollama yet? The model management here is fantastic, but switching environments is a pain if the API compatibility isn't solid. Let's go with a mix of appreciation for the tool and a technical question about integration/performance, as that's classic HN.

Der_Einzige•1w ago

Why is it that there are ZERO truly prosumer LLM front ends from anyone you can pay?

The closest thing we have to an LLM front end where you can actually CONTROL your model (i.e. advanced sampling settings) is oobabooga/sillytavern - both ultimately UIs designed mostly for "roleplay/cooming". It's the same shit with image gen and ComfyUI too!!!

LM Studio purported to be something like those two, but it has NEVER properly supported even a small fraction of the settings that LLMs use, and thus it's DOA for prosumer/pros.

I'm glad that claude code and moltbot are killing this whole genre of Software since apparently VC backed developers can't be trusted to make it.

echelon•1w ago

I'm working on the image / video space. You can pay us or byok. It's a fair source license, still TBD:

https://github.com/storytold/artcraft

Roadmap: Auth with all frontier AI image/video model providers, FAL, other aggregators. Focus on tangible creation rather than node graphs (for now).

I'm a filmmaker, so I'm making this for my studio and colleagues.

redrove•1w ago

You’re forgetting about Open WebUI.

Der_Einzige•1w ago

Which is still WAY less feature complete than oobabooga/sillytavern and it's not even close.

pram•1w ago

Is there an iOS/Android app that supports the LM Studio API(s) endpoints? That seems to be the "missing" client, especially now with llmster (tbh I haven't looked very hard)

PeterStuer•1w ago

Apps that allow you to configure an OpenAI api endpoint should work.

snvzz•1w ago

Is the GUI still unable to connect to an instance of lm-studio running elsewhere?

hnlmorg•1w ago

How does LM Studio differ from Ollama? Why would I use one rather than the other?

The impression I get is that LM Studio is basically an Ollama-type of solution but with an IDE included -- is that a fair approximation?

Things change so fast in the AI space that I really cannot keep up :(

anhner•1w ago

It offers a GUI for easier configuration and management of models, and it allows you to store/load models as .gguf something ollama doesn't do (it stores the models across multiple files - and yes, I know you can load a .gguf in ollama but it still makes a copy in its weird format so now I need to either have a duplicate on my drive or delete my original .gguf)

hnlmorg•1w ago

Thanks for the insights. I'm not familiar with .gguf. What's the advantage of that format?

atwrk•1w ago

.gguf is the native format of llama.cpp and is widely used for quantized models (models with reduced float accuracy to reduce memory requirements).

llama.cpp is the actual engine running the llms, ollama is a wrapper around it.

embedding-shape•1w ago

> llama.cpp is the actual engine running the llms, ollama is a wrapper around it.

How far did they get with their own inference engine? I seem to recall for the launch of Gemma (or some other model), they also launched their own Golang backend (I think), but never heard anything more about it. I'm guessing they'll always use llama.cpp for anything before that, but did they continue iterating on their own backend and how is it today?

martinald•1w ago

Ollama is CLI/API "first". LM studio is a proper full blown gui with chat features etc. It's far easier to use than Ollama at least for non technical users (though they are increasingly merging in functionality, with LM studio adding CLI/API features and Ollama adding more UI).

james_marks•1w ago

Even as a technical person, when I wanted to play with running models locally, LM Studio turned it into a couple of button clicks.

Without much background, you’re finding models, chatting with them, have an OpenAI-compatible API w/logging. Haven’t seen the new version, but LM Studio was already pretty great.

secult•1w ago

LM Studio is awesome in a way how easily you can start with local models. Nice UX, not needed to tweak every detail, but giving you the options to do so if you want.

neves•1w ago

Does it work with NPUs ?

auscompgeek•1w ago

Depending on what NPU you have yes.

embedding-shape•1w ago

In the end it's llama.cpp doing the inference, so whatever llama.cpp supports, you should be able to use with LM Studio

pzo•1w ago

Finally UI that is not so ugly. Now I'm only wondering if I somehow can setup that I can share the same LLM models between LM Studio and llamabarn/Ollama (so that I don't have to waste storage on duplicated models).

embedding-shape•1w ago

Ollama made the wonderful choice of trying to replicate Docker registries/layers for the model weights, so of course the models you download with Ollama cannot be easily reused with other tooling.

Compared to models downloaded with LM Studio, which are just the directories + the weights as made, you just point llama.cpp/$tool-of-choice and it works.

tarruda•1w ago

These days I don't feel the need to use anything other than llama.cpp server as it has a pretty good web UI and router mode for switching models.

roger_•1w ago

MLX support on Macs was the main reason for me.

embedding-shape•1w ago

I mostly use LM Studio for browsing and downloading models, testing them out quickly, but then actually integrating them is always with either llama.cpp or vLLM. Curious to try out their new cli though and see if it adds any extra benefits on top of llama.cpp.

mycall•1w ago

Concurrency is an important use case when running multiple agents. vLLM can squeeze performance out of your GB10 or GPU that you wouldn't get otherwise.

embedding-shape•1w ago

Also they've just spent more time optimizing vLLM than llama.cpp people done, even when you run just one inference call at a time. Best feature is obviously the concurrency and shared cache though. But on the other hand, new architectures are usually sooner available in llama.cpp than vLLM.

Both have their places and are complementary, rather than competitors :)

tarruda•1w ago

I'm only interested in the local, single user use case. Plus I use a Mac studio for inference, so vLLM is not an option for me.

mycall•1w ago

You can get concurrency gains [0] as local/single user (multi-agent) use case with vLLM with your Mac Studio.

[0] https://youtu.be/Ze5XLooTt6g?t=658

TomMasz•1w ago

I've been using LM Studio for a while, this is a nice update. For what I need, running a local model is more than adequate. As long as you have sufficient RAM, of course.

chris_st•1w ago

My complaint is that LM Studio insists on installing as admin on my Mac. For no apparent reason, and they refuse to say why.

embedding-shape•1w ago

Is this possibly the same as this issue? https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/4...

I've only use LM Desktop on Linux and Windows, never seen anything asking for elevated permissions.

chris_st•1w ago

Nope - on macOS, almost all apps are just "drag this to wherever (usually your own personal application folder)" and they work perfectly, since they don't need admin privileges. But this one insists on running from /Applications - the root application directory - and for reason. To install there, you have to be admin. I really don't want apps installed as admin, and possibly then able to get admin privileges. It's just basic security.

There's a thread on their Discord that was reported in February of last year. No fix, no comments.

embedding-shape•1w ago

Long time ago I used macOS, but aren't you confusing things here? Yes, you need admin permission to put stuff in /Applications, but that doesn't mean the applications inside of /Applications get root access by default, or even in any other mean that applications located elsewhere. Am I getting that wrong?

chris_st•6d ago

Honestly, I don't know! I should write an app and see who it runs as. I did an `ls -l /Applications`, and while every file is owned by `root`, none has the `suid` bit set.

LM Studio doesn't have an installer. Those often have to run as admin, and who knows what they're doing then, so that probably wrongly set my concerns about putting stuff in /Applications/.

I'll dig around the interwebs and see if this is answered elsewhere.

Thanks!

chris_st•6d ago

Turns out the apps in /Application/ run as you. Problem (between keyboard and chair :-) solved.

arajnoha•1w ago

hijacking this, what is the best local model (and tool to use it) for programming, if i only have 256gb ssd on a mac? im very used to codex and while i get that it will never be this smart locally, is there any coding model like it, not too heavy on space?

desipenguin•1w ago

Does this version support only M-Series mac ? Download page (https://lmstudio.ai/download) shows only `M Series` in the running dropdown

Start all of your commands with a comma

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Jeffrey Snover: "Welcome to the Room"

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Vocal Guide – belt sing without killing yourself

Show HN: I spent 4 years building a UI design tool with only the features I use

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Microsoft open-sources LiteBox, a security-focused library OS

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Was Benoit Mandelbrot a hedgehog or a fox?

PC Floppy Copy Protection: Vault Prolok

Dark Alley Mathematics

How to effectively write quality code with AI

Delimited Continuations vs. Lwt for Threads

What Is Ruliology?

Where did all the starships go?

Introducing the Developer Knowledge API and MCP Server

Female Asian Elephant Calf Born at the Smithsonian National Zoo

I now assume that all ads on Apple news are scams

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Understanding Neural Network, Visually

Why I Joined OpenAI

Learning from context is harder than we thought

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Start all of your commands with a comma

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Jeffrey Snover: "Welcome to the Room"

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Vocal Guide – belt sing without killing yourself

Show HN: I spent 4 years building a UI design tool with only the features I use

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Microsoft open-sources LiteBox, a security-focused library OS

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Was Benoit Mandelbrot a hedgehog or a fox?

PC Floppy Copy Protection: Vault Prolok

Dark Alley Mathematics

How to effectively write quality code with AI

Delimited Continuations vs. Lwt for Threads

What Is Ruliology?

Where did all the starships go?

Introducing the Developer Knowledge API and MCP Server

Female Asian Elephant Calf Born at the Smithsonian National Zoo

I now assume that all ads on Apple news are scams

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Understanding Neural Network, Visually

Why I Joined OpenAI

Learning from context is harder than we thought

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

LM Studio 0.4

Comments