Go look at the benchmark numbers of qwen3-4B if you think these are unrealistic.
I loaded up a random 12B model on ollama the other day and couldn't believe how good it competent it seemed and how fast it was given the machine I was on. A year or so ago, that would have not been the case.
We just need Google or Apple to provide their own equivalent of both: Ollama and OpenRouter so user either use inference for free with local models or BringYourOwnKey and pay themself for tokens/electricity bill. We then just charge smaller fee for renting or buying our cars.
I think ML is more akin to open source hardware, in the sense that even when there are people with the relevent skills willing to donate their time for free, the cost of actually realizing their ideas is still so high that it's rarely feasible to keep up with commercial projects.
my employer talks about spending 10s of millions on AI
but, even at this early stage, my experiments indicate that the smaller, locally-run models are just fine for a lot of tech and business tasks
this approach has definite privacy advantages and likely has cost advantages, vs pay-per-use LLM over API.
Problem was that of a top ten book recommendations only the first 3 existed and the rest was a casually blended hallucination delivered in perfect English without skipping a beat.
"You like magic? Try reading the Harlew Porthouse series by JRR Marrow, following the orphan magicians adventures in Hogwesteros"
And the further towards the context limit it goes the deeper this descent into creative derivative madness it goes.
It's entertaining but limited in usefulness.
https://gist.github.com/estsauver/a70c929398479f3166f3d69bce...
[0]: https://huggingface.co/mlabonne/gemma-3-12b-it-abliterated
Probably within few hours will be released.
But yeah waiting is the easier option
I think you meant Anthropic. OpenAI is "planning" to release an open weight model this year likely competing against the Llama models. [0]
I have not seen an open weight AI model ever being released by Anthropic at all.
They could've called it Xiaomimo.
Also related reference https://en.wikipedia.org/wiki/Xiaomi#Name_etymology
It's clearly impossible for me to try anything in Chinese, I'd need a translation.
Sad reality is that not many outside of China have the facility with Mandarin to use those models. Even non-native Mandarin speakers who claim to be "fluent", are often messing up intended meaning in text. Or making literal translations that wind up making no sense.
Inside of China, llm use will be Mandarin based. Outside, it seems to me English is the natural choice.
Irony of Irony, probably the best way for a non Mandarin speaking layman to test a Mandarin based model would be to use another LLM to translate prompts to Mandarin.
It's a sad future we're looking at.
Or a brilliant one.
Time will tell.
Uh? Pinyin input is by far the most popular input technique in China. I rarely see anyone using handwriting input.
That being said, it has nothing to do with English winning. It's just a Chinese input technique that uses the latin alphabet. English fluency in China is not very common, especially spoken English.
It could not have been further from a bilingual society.
Here is the meaning of the name
Described here: https://finance.sina.cn/tech/2020-11-26/detail-iiznctke33979...
在后来的讨论中,我突然想到了我最喜欢的一句话——“佛观一粒米,大如须弥山”。
Translated into English, it means:
“In the later discussions, I suddenly thought of one of my favorite sayings — ‘A Buddha sees a single grain of rice as vast as Mount Sumeru.’”
This expression emphasizes the idea that even something seemingly small (like a grain of rice) can hold immense significance or value when viewed from a different perspective.
Thanks to chatgpt for translating this
https://github.com/ollama/ollama/blob/main/docs%2Fmodelfile....
Here is my workflow when using Open WebUI:
1. ollama show qwen3:30b-a3b-q8_0 --modelfile
2. Paste the contents of the modelfile into -> admin -> models -> OpenwebUI and rename qwen3:30b-a3b-q8_0-monkversion-1
3. Change parameters like num_gpu 90 to change layers... etc.
4. Keep | Delete old file
Pay attention to the modelfile, it will show you something like this: # To build a new Modelfile based on this, replace FROM with: # FROM qwen3:30b-a3b-q8_0 and you need to make sure the paths are correct. I store my models on a large nvme drive that isn't default ollama as an example of why that matters.
EDIT TO ADD: The 'modelfile' workflow is a pain in the booty. It's a dogwater pattern and I hate it. Some of these models are 30 to 60GB and copying the entire thing to change one parameter is just dumb.
However, ollama does a lot of things right and it makes it easy to get up and running. VLLM, SGLang, Mistral.rs and even llama.cpp require a lot more work to setup.
I meant when you download a gguf file from huggingface, instead of using a model from ollama's library.
This will show a separate entry in `ollama list` but only copy the Modelfile not the GGUF.
Alternatively, if you use the API, you can override parameters "temporarily". Some UIs let you do this easily, at least for common parameters.
Notice most of the models they are comparing with are 7B models. The exception is also an open weights model (Qwen-2.5-32B-RL-Zero). Even with 32B parameters the MiMo-7B outperforms it.
A couple things stand out to me — first is that the 7B model is trained on 25T tokens(!). This is Meta-scale training; Llama 4 Maverick was trained on 22T or so. (Scout, the smaller model: 40T).
Second, this is an interesting path to take - not a distilled model or an RL layer to get reasoning out of another model, but a from-scratch RL model with reasoning baked in; the claims seem to indicate you get a lot of extra efficiency per-parameter doing this.
I don’t have experience with Xiaomi models, so I’m cautious about this one until I play with it, but it looks like a super viable local reasoning model from the stats.
w4yai•5h ago
Alifatisk•5h ago
yorwba•4h ago