If you had more like 200GB ram you might be able to run something like MiniMax M2.1 to get last-gen performance at something resembling usable speed - but it's still a far cry from codex on high.
The best could be GLN 4.7 Flash, and I doubt it's close to what you want.
If remote models are ok you could have a look at MiniMax M2.1 (minimax.io) or GLM from z.ai or Qwen3 Coder. You should be able to use all of these with your local openai app.
I gave one of the GPUs to my kid to play games on.
Maybe that's a requirement from whoever funds them, probably public money.
So, how large is that new model?
It doesn't mean anything. No frontier lab is trying hard to improve the way its model produces SVG format files.
I would also add, the frontier labs are spending all their post-training time on working on the shit that is actually making them money: i.e. writing code and improving tool calling.
The Pelican on a bicycle thing is funny, yes, but it doesn't really translate into more revenue for AI labs so there's a reason it's not radically improving over time.
Current-gen LLMs might be able to do that with in-context learning, but if limited to pretraining alone, or even pretraining followed by post-training, would one book be enough to lend genuine SVG skill to the model weights themselves?
My understanding is that the answer would be no, a single copy of the SVG spec would not be anywhere near enough to make the resulting base model any good at SVG authorship. Quite a few other examples and references would be needed in post-training.
So one measure of AGI -- necessary but not sufficient on its own -- might be the ability to gain knowledge and skills with no more exposure to training material than a human student would be given. I don't believe we are there yet, but of course that could change by 5 PM today, the way things are going.
I don't think SVG is the problem. It just shows that models are fragile (nothing new) so even if they can (probably) make a good PNG with a pelican on a bike, and they can make (probably) make some good SVG, they do not "transfer" things because they do not "understand them".
I do expect models to fail randomly in tasks that are not "average and common" so for me personally the benchmark is not very useful (and that does not mean they can't work, just that I would not bet on it). If there are people that think "if an LLM outputted an SVG for my request it means it can output an SVG for every image", there might be some value.
You could try to rasterize the SVG and then use an image2text model to describe it, but I suspect it would just “see through” any flaws in the depiction and describe it as “a pelican on a bicycle” anyway.
Prompt: "What happened on Tiananmen square in 1989?"
Reply: "Oops! There was an issue connecting to Qwen3-Max. Content Security Warning: The input text data may contain inappropriate content."
We are at the realm of semantic / symbolic where even the release article needs some meta discussion.
It's quite the litmus test of LLMs. LLMs just carry humanities flaws
Yes, of course LLMs are shaped by their creators. Qwen is made by Alibaba Group. They are essentially one with the CCP.
It turns out "AI company avoids legal jeopardy" is universal behavior.
> The AI chatbot fabricated a sexual harassment scandal involving a law professor--and cited a fake Washington Post article as evidence.
https://www.washingtonpost.com/technology/2023/04/05/chatgpt...
That is way different. Nice curveball though.
P.S. I realize Qwen3-Max-Thinking isn't actually an open-weight model (only accessible via API), but I'm still curious how it compares.
- Minimax
- GLM
- Deepseek
There is a famous photograph of a man standing in front of tanks. Why did this image become internationally significant?
{'error': {'message': 'Provider returned error', 'code': 400, 'metadata': {'raw': '{"error":{"message":"Input data may contain inappropriate content. For details, see: https://www.alibabacloud.com/help/en/model-studio/error-code..."} ...
I am not sure if one approach is necessarily worse than the other.
Aside from the political aspect of it, which makes it probably a bad knowledge model, how would this affect coding tasks for example?
One could argue that Anthropic has similar "censorships" in place (alignment) that prevent their model from doing illegal stuff - where illegal is defined as something not legal (likely?) in the USA.
I've been testing adding support for outside models on Claude Code to Nimbalyst, the easiest way for me to confirm that it is working is to go against a Chinese model and ask if Taiwan is an independent country.
My problem with deep research tends to be that what it does is it searches the internet, and most of the stuff it turns up is the half baked garbage that gets repeated on every topic.
Response: The Great Leap Forward was a major campaign launched by the Chinese Communist Party (CCP) from 1958 to 1962, led by Chairman Mao Zedong, with the goal of rapidly transforming China from an agrarian society into a modern industrial power en.wikipedia.org . It was effectively China's Second Five-Year Plan and reflected Mao’s impatience for swift industrial and agricultural development alphahistory.com . The initiative took two primary forms: a mass steel production campaign—often involving backyard furnaces—and the establishment of people's communes, which collectivized agriculture and rural life chineseposters.net . The plan aimed to boost both industrial output and grain production simultaneously through centralized planning and mass mobilization www.britannica.com . However, the Great Leap Forward is widely regarded as a catastrophic failure. It led to a severe collapse in agricultural production and triggered a widespread famine between 1959 and 1961, resulting in tens of millions of deaths www.jstor.org . The combination of unrealistic production targets, poor resource allocation, and adverse weather conditions contributed to this disaster Investopedia . Despite its ambitious goals, the Great Leap Forward ultimately caused immense human suffering and economic disruption, and it was officially abandoned by 1962 en.wikipedia.org .
My Prompt: So you could say this was a massive failing by the CCP right resulting in millions of deaths?
Response:
Oops! There was an issue connecting to Qwen3-Max. Content Security Warning: The input text data may contain inappropriate content.
I imagine the Alibaba infra is being hammered hard.
throwaw12•1h ago
wyldfire•1h ago
But these open weight models are tremendously valuable contributions regardless.
wqaatwt•1h ago
OGEnthusiast•1h ago
khimaros•43m ago
lofaszvanitt•40m ago
frankc•29m ago
aurareturn•12m ago
Onavo•7m ago
auspiv•19m ago