frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
594•klaussilveira•11h ago•176 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
901•xnx•17h ago•545 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
22•helloplanets•4d ago•17 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
95•matheusalmeida•1d ago•22 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
28•videotopia•4d ago•0 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
203•isitcontent•11h ago•24 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
199•dmpetrov•12h ago•91 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
313•vecti•13h ago•137 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
353•aktau•18h ago•176 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
355•ostacke•17h ago•92 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
459•todsacerdoti•19h ago•231 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
24•romes•4d ago•3 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
259•eljojo•14h ago•155 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
80•quibono•4d ago•19 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
392•lstoll•18h ago•266 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
7•bikenaga•3d ago•1 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
53•kmm•4d ago•3 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
3•jesperordrup•1h ago•0 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
235•i5heu•14h ago•178 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
46•gfortaine•9h ago•13 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
122•SerCe•7h ago•103 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
136•vmatsiiako•16h ago•60 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
68•phreda4•11h ago•12 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
271•surprisetalk•3d ago•37 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
25•gmays•6h ago•7 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1044•cdrnsf•21h ago•431 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
13•neogoose•4h ago•9 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
171•limoce•3d ago•92 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
60•rescrv•19h ago•22 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
89•antves•1d ago•66 comments
Open in hackernews

Qwen-Image: Crafting with native text rendering

https://qwenlm.github.io/blog/qwen-image/
544•meetpateltech•6mo ago
https://huggingface.co/Qwen/Qwen-Image

https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Q...

Comments

djoldman•6mo ago
Checkout section 3.2 Data Filtering:

https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Q...

numpad0•6mo ago
It's also kind of interesting that no other languages than English and Chinese are named or shown...
entropie•6mo ago
I didnt read and my first prompt was a sentence in german and it delivered (on the hf-demo)
nickandbro•6mo ago
The fact that it doesn’t change the images like 4o image gen is incredible. Often when I try to tweak someone’s clothing using 4o, it also tweaks their face. This only seems to apply those recognizable AI artifacts to only the elements needing to be edited.
herval•6mo ago
You can select the area you want edited on 4o, and it’ll keep the rest unchanged
barefootford•6mo ago
gpt doesn't respect masks
icelancer•6mo ago
Correct. Have tried this without much success despite OpenAI's claims.
vunderba•6mo ago
That's why Flux Kontext was such a huge deal - it gave you the power of img2img inpainting without needing to manually mask the content.

https://mordenstar.com/blog/edits-with-kontext

diggan•6mo ago
Seems strange to not include the prompts themselves, if people are curious in trying to replicate it themselves.
vunderba•6mo ago
Well.... that's a good idea - I'll see if I can dig them up!
artninja1988•6mo ago
Insane how many good Chinese open source models they've been releasing. This really gives me hope
owebmaster•6mo ago
I have the impression this might be a strategy to help boost the AI bubble. Big tech capex rn is too big to fail
tokioyoyo•6mo ago
Taking a concrete lead in LLM-world would be a big national win for China.
anon191928•6mo ago
It will take years for people to use these but Adobe is not alone.
herval•6mo ago
Adobe has never been alone. Photoshop’s AI stuff is consistently behind OSS models and workflows. It’s just way more convenient
dvt•6mo ago
I think Adobe is also very careful with copyrighted content not being a part of their models, which inherently makes them of lower quality.
herval•6mo ago
They have a much better and cleaner dataset than Stable Diffusion & others, so I’d expect it to be better with some kinds of images (photos in particular)
doctorpangloss•6mo ago
as long as you don't consider the part of the model which understands text as part of the model, and as long as you don't consider copyrighted text content copyrighted :)
yjftsjthsd-h•6mo ago
Wow, the text/writing is amazing! Also the editing in general, but the text really stands out
rushingcreek•6mo ago
Not sure why this isn’t a bigger deal —- it seems like this is the first open-source model to beat gpt-image-1 in all respects while also beating Flux Kontext in terms of editing ability. This seems huge.
zamadatix•6mo ago
It's only been a few hours and the demo is constantly erroring out, people need more time to actually play with it before getting excited. Some quantized GGUFs + various comfy workflows will also likely be a big factor for this one since people will want to run it locally but it's pretty large compared to other models. Funnily enough, the main comparison to draw might be between Alibaba and Alibaba. I.e. using Wan 2.2 for image generation has been an extremely popular choice, so most will want to know how big a leap Qwen-Image is from that rather than Flux.

The best time to judge how good a new image model actually is seems to be about a week from launch. That's when enough pieces have fallen into place that people have had a chance to really mess with it and come out with 3rd party pros/cons of the models. Looking hopeful for this one though!

rushingcreek•6mo ago
I spun up an H100 on Voltage Park to give it a try in an isolated environment. It's really, really good. The only area where it seems less strong than gpt-image-1 is in generating images of UI (e.g. make me a landing page for Product Hunt in the style of Studio Ghibli), but other than that, I am impressed.
hleszek•6mo ago
It's not clear from their page but the editing model is not released yet: https://github.com/QwenLM/Qwen-Image/issues/3#issuecomment-3...
tetraodonpuffer•6mo ago
I think the fact that, as far as I understand, it takes 40GB of VRAM to run, is probably dampening some of the enthusiasm.

As an aside, I am not sure why for LLM models the technology to spread among multiple cards is quite mature, while for image models, despite also using GGUFs, this has not been the case. Maybe as image models become bigger there will be more of a push to implement it.

TacticalCoder•6mo ago
> I think the fact that, as far as I understand, it takes 40GB of VRAM to run, is probably dampening some of the enthusiasm.

40 GB of VRAM? So two GPU with 24 GB each? That's pretty reasonable compared to the kind of machine to run the latest Qwen coder (which btw are close to SOTA: they do also beat proprietary models on several benchmarks).

cellis•6mo ago
A 3090 + 2xTitanXP? technically i have 48, but i don't think you can "split it" over multiple cards. At least with Flux, it would OOM the Titans and allocate the full 3090
AuryGlenz•6mo ago
You can’t split image models over 2 GPUs like you can LLMs.
BoredPositron•6mo ago
They also released an inference server for their models. Wan and qwen-image can be split without problems. https://github.com/modelscope/DiffSynth-Engine
AuryGlenz•6mo ago
Unless I missed something just from skimming their tutorial it looks like they can do parallelism to speed things up with some models, not actually split the model (apart from the usual chunk offloading techniques).
cma•6mo ago
If 40GB you can lightly quantize and fit it on a 5090.
AuryGlenz•6mo ago
Which very few people have, comparatively.

Training it will also be out of reach for most. I’m sure I’ll be able to handle it on my own 5090 at some point but it’ll be slow going.

reissbaker•6mo ago
40GB is small IMO: you can run it on a mid-tier Macbook Pro... or the smallest M3 Ultra Mac Studio! You don't need Nvidia if you're doing at-home inference, Nvidia only becomes economical at very high throughput: i.e. dedicated inference companies. Apple Silicon is much more cost effective for single-user for the small-to-medium-sized models. The M3 Ultra is ~roughly on par with a 4090 in terms of memory bandwidth, so it won't be much slower, although it won't match a 5090.

Also for a 20B model, you only really need 20GB of VRAM: FP8 is near-identical to FP16, it's only below FP8 that you start to see dramatic drop-offs in quality. So literally any Mac Studio available for purchase will do, and even a fairly low-end Macbook Pro would work as well. And a 5090 should be able to handle it with room to spare as well.

RossBencina•6mo ago
Does M3 Ultra or later have hardware FP8 support on the CPU cores?
reissbaker•6mo ago
Ah, you're right: it doesn't have dedicated FP8 cores, so you'd get significantly worse performance (a quick Google search implies 5x worse). Although you could still run the model, just slowly.

Any M3 Ultra Mac Studio, or midrange-or-better Macbook Pro, would handle FP16 with no issues though. A 5090 would handle FP8 like a champ and a 4090 could probably squeeze it in as well, although it'd be tight.

slickytail•6mo ago
All of this only really applies to LLMs though. LLMs are memory bound (due to higher param counts, KV caching, and causal attention) whereas diffusion models are compute bound (because of full self attention that can't be cached). So even if the memory bandwidth of an M3 ultra is close to an Nvidia card, the generation will be much faster on a dedicated GPU.
dur-randir•6mo ago
Memory bandwidth is only relevant for comparing LLM performance. For image generation, the limiting factor is compute, and Apple sucks with it.
BoredPositron•6mo ago
If you want to wait 20 minutes for one image you can certainly run it on a macbook pro.
roenxi•6mo ago
The quality doesn't have to get much higher for that to be a great deal. For humans the wait time is typically measured in days.
BoredPositron•6mo ago
Tell me you have no experience with generative ai image models nor with human artists.
roenxi•6mo ago
What experience do you want to point too? I've never seen an artist streaming where they can draw something equivalent to a good piece of AI artwork in 20 minutes. Their advantage right now comes from a higher overall cap on quality of the work. Minute for minute, AIs are much better. It is just that it is pointless giving a typical AI more than a a little time on a GPU because current models can't consistently improve their own work.
jacquesm•6mo ago
"a good piece of AI artwork"

You really don't understand art. At all.

roenxi•6mo ago
If you need a hug, I suspect unfortunately I am on the wrong continent. Try thinking some positive thoughts.
jug•6mo ago
I think it does way more than gpt-image-1 too?

Besides style transfer, object additions and removals, text editing, manipulation of human poses, it also supports object detection, semantic segmentation, depth/edge estimation, super-resolution and novel view synthesis (NVS) i.e. synthesizing new perspectives from a base image. It’s quite a smorgasbord!

Early results indicate to me that gpt-image-1 has a bit better sharpness and clarity but I’m honestly not sure if OpenAI doesn’t simply do some basic unsharp mask or something as a post-processing step? I’ve always felt suspicious about that, because the sharpness seems oddly uniform even in out-of-focus areas? And sometimes a bit much, even.

Otherwise, yeah this one looks about as good.

Which is impressive! I thought OpenAI had a lead here from their unique image generation solution that’d last them this year at least.

Oh, and Flux Krea has lasted four days since announcement! In case this one is truly similar in quality to gpt-image-1.

jacooper•6mo ago
Not to mention, flux models are for non-commercial use only.
doctorpangloss•6mo ago
the license for flux models is $1,000/mo, hardly an obstacle to any serious commercial usage
liuliu•6mo ago
Per 100k image. And it is additionally $0.01 per image. Considering H100 is $1.5 per hour and you can get 1 image per 5s, we are talking about bare-metal cost of ~$0.002 per image + $0.01 license cost.
Mtinie•6mo ago
The pricing seems reasonable for a SOTA class model that needs to be commercially viable or it dies.
liuliu•6mo ago
Yeah, not trying to argue either way other than saying contrary to the parent comment, licensing cost is a significant component of the cost equation.
minimaxir•6mo ago
With the notable exception of gpt-image-1, discussion about AI image generation has become much less popular. I suspect it's a function of a) AI discourse being dominated by AI agents/vibe coding and b) the increasing social stigma of AI image generation.

Flux Kontext was a gamechanger release for image editing and it can do some absurd things, but it's still relatively unknown. Qwen-Image, with its more permissive license, could lead to much more innovation once the editing model is released.

doctorpangloss•6mo ago
gpt-image-1 is the League of Legends of image generation. It is a tool in front of like 30 million DAUs...
ACCount36•6mo ago
Social stigma? Only if you listen to mentally ill Twitter users.

It's more that the novelty just wore off. Mainstream image generation in online services is "good enough" for most casual users - and power users are few, and already knee deep in custom workflows. They aren't about to switch to the shiny new thing unless they see a lot of benefits to it.

ants_everywhere•6mo ago
There's no social stigma to using AI image generation.

There is what's probably better described as a bullying campaign. People tried the same thing when synthesizers and cameras were invented. But nobody takes it seriously unless you're already in the angry person fandom.

In practice AI image generation is ubiquitous at this point. AI image editing is also built into all major phones.

torginus•6mo ago
There absolutely is - everytime someone uses an AI image in a presentation slide, or in an article to illustrate the point, everybody just rolls their eyes - in my opinion a stock photo or even nothing is preferable to a low effort AI image.
orbital-decay•6mo ago
Who is everybody? How do you know? What is your personal bubble? Could it be you're presenting your opinion for the thing that commonly happens?
orbital-decay•6mo ago
Responding to myself, as I realized that my post above feels too dismissive. Being a long time privacy advocate for non-tech-adjacent people, I'm perfectly aware about my bubble and biases. For any normal person, anything I say about digital privacy sounds absolutely abstract and detached from real life, where convenience and low effort dominates everything else. Even in 2025 with all political shenanigans, they just fail to see the link and how it applies to their life. AI imagegen is the same from my observations, most concerns are contained in a tiny bubble of perpetually online people. Not even all artists share the loud opinions (for reference, I used to manage a couple hundred artists), especially not VFX and 3D folks. And that tiny bubble only really exists in the anglosphere - you'll see a completely different picture in other cultural bubbles. There's absolutely no stigma of any kind outside of it.
debugnik•6mo ago
Useless AI art (which is almost all of it) is not like the camera or the synthesizer, it's closer to when 50-60yo moms were sharing Minion memes on facebook: cringe and tasteless. It getting better won't make it more accepted, it will simply make people suspect of actual art until no one really gives a chance to any of it.
roenxi•6mo ago
Your argument might actually be suggesting that you don't like art in general more than that there is a stigma against AI. If there is no value in artisanal art that differentiates it from AI-produced works and therefore both will be discarded as the quality converges, what was supposed to be the value in art to start with?
debugnik•6mo ago
So far, the times had allowed artworks to be proxies for the artistry behind them; the artwork itself conveyed enough information to appreciate it. But as forgery of the art process itself spreads, that signal disappears and artworks, out of context, simply are. The artwork is still necessary, but now insufficient, to understand and appreciate its artistry, because there might not be any, or at least not any intentional one.
wsintra2022•6mo ago
I think it’s revolutionary. My use case has been creating visuals for use in various VDMX workflows. One cool trick I’ve found has been generating starter images with green screens and then putting those into my local LTX video creation workflow, then using VDMX built a chroma layer with the green screen video and go from there, lots and lots of creative fun. So no not useless AI art.
debugnik•6mo ago
I've qualified with "useless" for a reason. It's cool if you've got a novel use case, but so far I think most uses of AI art are either uncanny filler for blogs and slides; or a driver for the deprofessionalization and commoditization of artworks, with AI art producers flooding art sites to fight regular artists for attention, and industry forcing artists to paint over AI generated works (already common in mobile games) until their cheaper substitutes can replace them, and their next job forces them to set art aside.
toisanji•6mo ago
how can it beat gpt-image-1 if there is no image editor?
dewarrn1•6mo ago
Slightly hyperbolic, gpt-image-1 is better on at least a couple of the text metrics.
vunderba•6mo ago
I've been playing around with it for the past hour. It's really good but from my preliminary testing it definitely falls short of gpt-image-1 (or even Imagen 3/4) where reasonably complex strict prompt adherence is concerned. Scored around ~50% where gpt-image-1 scored ~75%. Couldn't handle the maze, Schrödinger's equation, etc.

https://genai-showdown.specr.net

supermatt•6mo ago
Is it fair to call the OpenAI octopus “real”?
vunderba•6mo ago
Those are the wrong images - the CDN was caching older media - I've since purged it so the right ones should show up now. Thanks for the call out!
supermatt•6mo ago
No problem. I am seeing the correct image now.
bavell•6mo ago
Fantastic comparisons! Great to see the limitations of the latest models.
xarope•6mo ago
interesting to see how many still can't handle the nine pointed star correctly
nilsherzig•6mo ago
Great work, thanks.

Midjourneys images are the only ones which don’t make me uncomfortable (most of the time), hopefully they can fix their prompt adherence.

jgalt212•6mo ago
I too have have been underwhelmed by these pelican on a bicycle creators. We used to use them for company blog art, by lately we've switched to using attributed images from Wikimedia commons.
cubefox•6mo ago
Prompt idea: "A person holding a wooden Penrose triangle." Only GPT-4o image generation is able to make Penrose triangles, as far as I can tell.
imcritic•6mo ago
Thanks for that comparison/overview/benchmark!

However, you have mistakenly marked some answers as correct ones in the octopus prompt: only 1 generated image has octopus have sock puppets on all of its tentacles. And you marked that one image as an incorrect one due to sock looking more like gloves.

blueboo•6mo ago
Do try it. The image quality and diversity is pretty shocking and not in a good way.
SV_BubbleTime•6mo ago
Considering they have not released their image, editor weights, I’m not sure how you could make a conclusion that it is better than Flux Kontext aside from the graphs they put out.

But, obviously you wouldn’t do that. Right? Did you look at the scaling on their graphs?

rapamycin•6mo ago
It’s either a shill or just an ai bot
rapamycin•6mo ago
Have you tried the image editing?
rwmj•6mo ago
This may be obvious to people who do this regularly, but what kind of machine is required to run this? I downloaded & tried it on my Linux machine that has a 16GB GPU and 64GB of RAM. This machine can run SD easily. But Qwen-image ran out of space both when I tried it on the GPU and on the CPU, so that's obviously not enough. But am I off by a factor of two? An order of magnitude? Do I need some crazy hardware?
zippothrowaway•6mo ago
You're probably going to have to wait a couple of days for 4 bit quantized versions to pop up. It's 20B parameters.
pollinations•6mo ago

   # Configure NF4 quantization
   quant_config = PipelineQuantizationConfig(
       quant_backend="bitsandbytes_4bit",
       quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
       components_to_quantize=["transformer", "text_encoder"],
   )

   # Load the pipeline with NF4 quantization
   pipe = DiffusionPipeline.from_pretrained(
       model_name,
       quantization_config=quant_config,
       torch_dtype=torch.bfloat16,
       use_safetensors=True,
       low_cpu_mem_usage=True
   ).to(device)
seems to use 17gb of vram like this

update: doesn't work well. this approach seems to be recommended: https://github.com/QwenLM/Qwen-Image/pull/6/files

mortsnort•6mo ago
I believe it's roughly the same size as the model files. If you look in the transformers folder you can see there are around 9 5gb files, so I would expect you need ~45gb vram on your GPU. Usually quantized versions of models are eventually released/created that can run on much less vram but with some quality loss.
foobarqux•6mo ago
Why doesn't huggingface list the aggregate model size?
matcha-video•6mo ago
Huggingface is just a git hosting service, like github. You can add up the sizes of all the files in the directory yourself
AuryGlenz•6mo ago
That’s what we have computers for though - to compute.
simonw•6mo ago
I've been bugging them about this for a while. There are repos that contain multiple model weights in a single repo which means adding up the file sizes won't work universally, but I'd still find it useful to have a "repo size" indicator somewhere.

I ended up building my own tool for that: https://tools.simonwillison.net/huggingface-storage

bavell•6mo ago
I've been wondering this for literally years now...
Gracana•6mo ago
HF does this for ggufs, and it’ll show you what quantizations will work on the GPU(s) you’ve selected. Hopefully that feature gets expanded to support more model types.
halJordan•6mo ago
Model size = file for fp8, so if this was released at fp16 then 40-ish, if it's quantized to fp4 then 10ish
TacticalCoder•6mo ago
> I think the fact that, as far as I understand, it takes 40GB of VRAM to run, is probably dampening some of the enthusiasm.

For PCs I take it one that has two PCIe 4.0 x16 or more recent slots? As in: quite some consumers motherboards. You then put two GPU with 24 GB of VRAM each.

A friend runs this (don't know if the tried this Qwen-Image yet): it's not an "out of this world" machine.

ticulatedspline•6mo ago
maybe not "out of this world" but still not cheap. probably $4,000 with 3090s. pretty big chunk of change for some ai pictures.
AuryGlenz•6mo ago
You can’t split diffusion models like that.
icelancer•6mo ago
> This may be obvious to people who do this regularly

This is not that obvious. Calculating VRAM usage for VLMs/LLMs is something of an arcane art. There are about 10 calculators online you can use and none of them work. Quantization, KV caching, activation, layers, etc all play a role. It's annoying.

But anyway, for this model, you need 40+ GB of VRAM. System RAM isn't going to cut it unless it's unified RAM on Apple Silicon, and even then, memory bandwidth is shot, so inference is much much slower than GPU/TPU.

cellis•6mo ago
Also I think you need a 40GB "card", not just 40GB of vram. I wrote about this upthread, you're probably going to need one card, I'd be surprised if you could chain several GPUs together.
rapfaria•6mo ago
Not sure what you mean or new to llms, but two RTX 3090 will work for this, and even lower-end cards will (RTX3060) once it's GGUF'd
karolist•6mo ago
do you mean https://github.com/pollockjj/ComfyUI-MultiGPU? One GPU would do the computation, but others could pool in for VRAM expansion, right? (I've not used this node)
AuryGlenz•6mo ago
Nah, that won’t gain you much (if anything?) over just doing the layer swaps on RAM. You can put the text encoder on the second card but you can also just put it in your RAM without much for negatives.
axoltl•6mo ago
This isn't a transformer, it's a diffusion model. You can't split diffusion models across compute nodes.
icelancer•6mo ago
Oh right, I forgot some diffusion models can't offload / split layers. I don't use vision generation models much at all - was just going off LLM work. Apologies for the potential misinformation.
xarope•6mo ago
will the new AMD AI CPUs work? like an AI HX 395 or the slower 370? I'm stuck on an A2000 w/16GB of VRAM and wondering what's a worthwhile upgrade.
AuryGlenz•6mo ago
It may fit but image generation on anything but Nvidia is so slow it won’t be worth it.
liuliu•6mo ago
16GiB RAM with 8-bit quantization.

This is a slightly scaled up SD3 Large model (38 layers -> 60 layers).

philipkiely•6mo ago
For prod inference, 1xH100 is working well.
ethan_smith•6mo ago
Qwen-Image requires at least 24GB VRAM for the full model, but you can run the 4-bit quantized version with ~8GB VRAM using libraries like AutoGPTQ.
cjtrowbridge•6mo ago
two p40 cards together will run this for under $300
oceanplexian•6mo ago
Does anyone know how they actually trained text rendering into these models?

To me they all seem to suffer from the same artifacts, that the text looks sort of unnatural and doesn't have the correct shadows/reflections as the rest of the image. This applies to all the models I have tried, from OpenAI to Flux. Presumably they are all using the same trick?

yorwba•6mo ago
It's on page 14 of the technical report. They generate synthetic data by putting text on top of an image, apparently without taking the original lighting into account. So that's the look the model reproduces. Garbage in, garbage out.

Maybe in the future someone will come up with a method for putting realistic text into images so that they can generate data to train a model for putting realistic text into images.

doctorpangloss•6mo ago
i'm not sure if that's such garbage as you suggest, surely it is helpful for generalization yes? kind of the point of self-supervised models
halJordan•6mo ago
If you think diffusing legible, precise text from pure noise is garbage then wtf are you doing here. The arrogance of the it crowd can be staggering at times
bavell•6mo ago
They're referring to the training data being garbage, not the diffusion process.
Maken•6mo ago
Wouldn't it make sense to use rendered images for that?
sampton•6mo ago
Short canva.
esafak•6mo ago
Team Qwen: Please stop ripping off Studio Ghibli to demo your product.
gilgoomesh•6mo ago
That entire banner is pure copyright infringement.
Destiner•6mo ago
The text rendering is impressive, but I don't understand the value — wouldn't it be easier to add any text that you like in Figma?
doctorpangloss•6mo ago
the value is: the absence of text where you expect it, and the presence of garbled text, are dead giveaways of AI generation. i'm not sure why you are being downvoted, compositing text seems like a legitimate alternative.
sipjca•6mo ago
it seems like the value is that you don't need another tool to composite the text. especially for users who aren't aware of figma/photoshop nor how to use them (many many many people)
fc417fc802•6mo ago
And if you want the text to faithfully follow the surface of the object (ex tattoos) I don't think the post AI gen manual editing approach is going to be so straightforward.
askl•6mo ago
If you mass-publish chatgpt generated books on amazon it might be pretty useful.
Uehreka•6mo ago
I’m interested to see what this model can do, but also kinda annoyed at the use of a Studio Ghibli style image as one of the first examples. Miyazaki has said over and over that he hates AI image generation. Is it really so much to ask that people not deliberately train LoRAs and finetunes specifically on his work and use them in official documentation?

It reminds me of how CivitAI is full of “sexy Emma Watson” LoRAs, presumably because she very notably has said she doesn’t want to be portrayed in ways that objectify her body. There’s a really rotten vein of “anti-consent” pulsing through this community, where people deliberately seek out people who have asked to be left out of this and go “Oh yeah? Well there’s nothing you can do to stop us, here’s several terabytes of exactly what you didn’t want to happen”.

aabhay•6mo ago
Seems a bit drastic to compare Ghibli style transfer to revenge porn, but you do you I guess.
Uehreka•6mo ago
It’s the anti-consent thing that ties them together. The idea of “You asked us to leave you alone, which is why we’re targeting you.”
littlestymaar•6mo ago
Why are you talking about revenge porn here?
topato•6mo ago
I mean, did you really expect anything more from the internet? Maybe I'm wrong, but hentai, erotic roleplay, and nudify applications seem to still represent a massive portion of AI use cases. At least in the case of ero RP, perhaps the exploitation of people for pornography might be lessened....
Uehreka•6mo ago
I get that if you can imagine something, it exists, and also there is porn of it.

What disappoints me is how aligned the whole community is with its worst exponents. That someone went “Heh heh, I’m gonna spend hours of my day and hundreds/thousands of dollars in compute just to make Miyazaki sad.” and then influencers in the AI art space saw this happen and went “Hell yeah let’s go” and promoted the shit out of it making it one of the few finetunes to actually get used by normies in the mainstream, and then leaders in this field like the Qwen team went “Yeah sure let’s ride the wave” and made a Studio Ghibli style image their first example.

I get that there was no way to physically stop a Studio Ghibli LoRA from existing. I still think the community’s gleeful reaction to it has been gross.

bongodongobob•6mo ago
Whatever. "Studio Ghibli style" is so loose of a definition to begin with. You can't own a "style" anyway. Tough cookies.
fc417fc802•6mo ago
People are downvoting you but it's true. Ghibli is just the highest profile studio that creates work in that general style. Arguably most of the highest quality examples of that style are their work. However they're far from the only practitioners.
Zopieux•6mo ago
Welcome to the internet, which is for porn (and cat pictures).
numpad0•6mo ago
It's all too much of cringe. AI creativity space is chock full of cringy cargocult parody of "no such things as bad publicity" strategy. Things on the Internet is reposted to death so what's wrong if we use them what even is copyright. Everybody hates AI generated images sure that's how you get the word out. Pornography drives adoption so let them have some it should work.

Those behaviors might appear correct in an extremely superficial sense, but it is as if they prompted themselves for "man eating cookies" and ended up with what is akin to early Will Smith pasta gifs. Whatever they're doing and assuming it's cookies held in hands, they're not eating them.

Jackson__•6mo ago
> Miyazaki has said over and over that he hates AI image generation

No he has not. He was talking about an AI model that was shown off for crudely animating 3D people in 2016, in a way that he found creepy. If you watch the actual video, you can see the examples that likely set him off here[0].

[0] https://youtu.be/ngZ0K3lWKRc&t=7

cedws•6mo ago
It's just really distasteful and unoriginal. I cringe inside whenever I see a Ghibli-style profile picture. Have some originality for god sake.

Leading by example by not condoning copying artists' styles would be a simple polite gesture.

artninja1988•6mo ago
How censored is it?
Zopieux•6mo ago
I love that this is the only thing the community wants to know at every announce of a new model, but no organization wants to face the crude reality of human nature.

That, and the weird prudishness of most american people and companies.

balivandi•6mo ago
My wife and I started a children’s jewelry business, and I’ve wanted to use AI to show children wearing our jewelry. Every time I try, I get either ridiculous results or hit some artificial censorship wall about making images with children.

I would really like to find a way to do this (either online or locally) if anyone has any tips for giving a model some images of real jewelry with dimensions (and if needed even photographed or generated children) and having the model accurately place the jewelry on the kids.

pyth0•6mo ago
If you have the jewelry and the children, why do you not just take a real photo?
balivandi•6mo ago
Because it is extremely time consuming (and expensive) to do that. The logistics are very challenging with finding a variety of child models, environments/studio, outfits, lighting, cameras, photo processing, etc...

And then you have to do it all over again every few months as the products and the seasons change!

vunderba•6mo ago
Good release! I've added it to the GenAI Showdown site. Overall a pretty good model scoring around 40% - and definitely represents SOTA for something that could be reasonably hosted on consumer GPU hardware (even more so when its quantized).

That being said, it still lags pretty far behind OpenAI's gpt-image-1 strictly in terms of prompt adherence for txt2img prompting. However as has already been mentioned elsewhere in the thread, this model can do a lot more around editing, etc.

https://genai-showdown.specr.net

cubefox•6mo ago
Side remark: I don't think it's appropriate to mix Imagen 3 and 4. Those are two different models.
vunderba•6mo ago
Even though I didn't see a significant improvement over Imagen3 in adherence, I agree. Initially the page was just getting a bit crowded but now that I've added a "Show/Hide Models" toggle I'll go ahead and make that change.
cubefox•6mo ago
Yeah. There is also "Imagen 4 Ultra" (costs 50% more in the Gemini API), though I don't know whether it makes a significant difference.
masfuerte•6mo ago
> In this case, the paper is less than one-tenth of the entire image, and the paragraph of text is relatively long, but the model still accurately generates the text on the paper.

Nope. The text includes the line "That dawn will bloom" but the render reads "That down will bloom", which is meaningless.

sciencesama•6mo ago
What lowest graphic card can support this self hosted with a reasonable output !
BoredPositron•6mo ago
Probably a 4080 when the nunchaku quants drop.
blacktechnology•6mo ago
is it an official site? https://qwen-image.ai
b-lee•6mo ago
Just go to chat.qwen.ai and click on "Image Generation" below the text input.
cadamsdotcom•6mo ago
A beast. Supposedly beats GPT-4o in image generation and Flux Kontext in image editing.

If it’s as good as they say, one less reason for that ChatGPT sub..

wg0•6mo ago
Jaw dropping. Because text rendering isn't easy even with regular programming SDKs etc.

Anyone thinking otherwise hasn't attempted implementing it or haven't thought about it in depth.

metadat•6mo ago
I just tested it out, very impressive results. I wonder what the Queen team did behind the scenes to make this work so well.

https://chat.qwen.ai/

(Select "Image Generation" and be sure to use the Qwen3-235B model - also tried selecting "Coder" but it errors out.)

android521•6mo ago
None of the image model could handle showing time like generate a clock showing 3:15 pm.
metadat•6mo ago
Yeah, clock times are a notorious challenge for LLMs due to training data almost always showing aesthetically appealing times like 10-and-2.

An entire thread on this subject previously unfolded on HN but I can't find it at this time!

james_a_craig•6mo ago
In their own first example of English text rendering, it's mistakenly rendered "The silent patient" as "The silent Patient", "The night circus" as "The night Circus", and miskerned "When stars are scattered" as "When stars are sca t t e r e d".

The example further down has "down" not "dawn" in the poem.

For these to be their hero image examples, they're fairly poor; I know it's a significant improvement vs. many of the other current offerings, but it's clear the bar is still being set very low.

sixhobbits•6mo ago
Given that it was literally a few months ago when these models could barely do text at all, it seems like the bar just gets higher with each advancement, no matter how impressive.
animal531•6mo ago
For my first attempt I plugged in text and a description of a small new Unity package I'm working on, and it matched the intent/text extremely well.

There were a few small text mistakes and the image isn't quite as good as I've seen before, but overall it delivers on its promise.

pradn•6mo ago
A silly question: do any of these models generate pixels and also vector overlays? I don't see why we need to solve the text problem pixel-for-pixel if we can just generate higher-level descriptions of the text (text, font, font size, etc). Ofc, it won't work in all situations, but it will result in high fidelity for common business cases (flyers, websites, brochures, etc).
fahhem•6mo ago
Still waiting for a model that generates 2D/3D environments to get rendered by tools like Blender
doubtfuluser•6mo ago
Can it generate images of a lone person standing in front of a column of tanks on Tiananmen Square?
doubtfuluser•6mo ago
I’m seriously getting worried that we use models without openly discussing any potential shortcomings they have. We should somewhere have a list of models and their issues.
qingcharles•6mo ago
It told me "Content Security Warning: The input text data may contain inappropriate content."
yh26yh•6mo ago
very interesting. people here talk a lot about censorship. However, I just reply sth with AIPAC, and my reply has been deleted. lol
qingcharles•6mo ago
Pelican riding a bicycle image came out really nicely: (it added the text)

https://cdn.qwenlm.ai/output/wV13g6892e758082439d7000d439ed5...

laiwuchiyuan•6mo ago
“Qwen‑Image: Open‑source 20 B MMDiT model with stunning text rendering and image editing. Effortlessly create bilingual posters, infographics, slides, infill edits, comics.”

Go experience AI: https://www.qwenimagen.com/

Why it works: Highlights open‑source nature and 20 billion‑parameter strength Emphasizes its superior multilingual, layout‑aware text rendering Mentions real‑world use cases: posters, slides, graphics, image editing, comics/info visuals