frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•23s ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
1•andreabat•2m ago•0 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
1•mgh2•8m ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•10m ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•15m ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•17m ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
1•lifeisstillgood•17m ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•20m ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•22m ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
1•birdculture•23m ago•0 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•25m ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
1•ramenbytes•27m ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•29m ago•0 comments

Ed Zitron: The Hater's Guide to Microsoft

https://bsky.app/profile/edzitron.com/post/3me7ibeym2c2n
2•vintagedave•32m ago•1 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
1•__natty__•32m ago•0 comments

Show HN: Android-based audio player for seniors – Homer Audio Player

https://homeraudioplayer.app
3•cinusek•33m ago•0 comments

Starter Template for Ory Kratos

https://github.com/Samuelk0nrad/docker-ory
1•samuel_0xK•34m ago•0 comments

LLMs are powerful, but enterprises are deterministic by nature

2•prateekdalal•38m ago•0 comments

Make your iPad 3 a touchscreen for your computer

https://github.com/lemonjesus/ipad-touch-screen
2•0y•43m ago•1 comments

Internationalization and Localization in the Age of Agents

https://myblog.ru/internationalization-and-localization-in-the-age-of-agents
1•xenator•43m ago•0 comments

Building a Custom Clawdbot Workflow to Automate Website Creation

https://seedance2api.org/
1•pekingzcc•46m ago•1 comments

Why the "Taiwan Dome" won't survive a Chinese attack

https://www.lowyinstitute.org/the-interpreter/why-taiwan-dome-won-t-survive-chinese-attack
2•ryan_j_naughton•46m ago•0 comments

Xkcd: Game AIs

https://xkcd.com/1002/
1•ravenical•48m ago•0 comments

Windows 11 is finally killing off legacy printer drivers in 2026

https://www.windowscentral.com/microsoft/windows-11/windows-11-finally-pulls-the-plug-on-legacy-p...
1•ValdikSS•48m ago•0 comments

From Offloading to Engagement (Study on Generative AI)

https://www.mdpi.com/2306-5729/10/11/172
1•boshomi•50m ago•1 comments

AI for People

https://justsitandgrin.im/posts/ai-for-people/
1•dive•51m ago•0 comments

Rome is studded with cannon balls (2022)

https://essenceofrome.com/rome-is-studded-with-cannon-balls
1•thomassmith65•56m ago•0 comments

8-piece tablebase development on Lichess (op1 partial)

https://lichess.org/@/Lichess/blog/op1-partial-8-piece-tablebase-available/1ptPBDpC
2•somethingp•58m ago•0 comments

US to bankroll far-right think tanks in Europe against digital laws

https://www.brusselstimes.com/1957195/us-to-fund-far-right-forces-in-europe-tbtb
4•saubeidl•59m ago•0 comments

Ask HN: Have AI companies replaced their own SaaS usage with agents?

1•tuxpenguine•1h ago•0 comments
Open in hackernews

Pico-Banana-400k

https://github.com/apple/pico-banana-400k
395•dvrp•3mo ago

Comments

vunderba•3mo ago
From the paper

> The pipeline (bottom) shows how diverse OpenImages inputs are edited using Nano-Banana and quality-filtered by Gemini-2.5-Pro, with failed attempts automatically retried.

Pretty interesting. I run a fairly comprehensive image-comparison site for SOTA generative AI in text-to-image and editing. Managing it manually got pretty tiring, so a while back I put together a small program that takes a given starting prompt, a list of GenAI models, and a max number of retries which does something similar.

It generates and evaluates images using a separate multimodal AI, and then rewrites failed prompts automatically repeating up to a set limit.

It's not perfect (nine pointed star example in particular) - but often times the "recognition aspect of a multimodal model" is superior to its generative capabilities so you can run it in a sort of REPL until you get the desired outcome.

https://genai-showdown.specr.net/image-editing

typpilol•3mo ago
I love your site I stumble across it once a month it seems.

Or there's another very similar site. But I'm pretty sure it's yours

vunderba•3mo ago
Thanks! It's probably the same site. It used to only be a showdown of text-to-image models (Flux, Imagen, Midjourney, etc), but once there was a decent number of image-to-image models (Kontext, Seedream, Nano-Banana) I added a nav bar at the top so I could do similar comparisons for image editing.
typpilol•3mo ago
Yes that was exactly it.

How often do you update it? It seems like something new every time I check. Or I forget everything..

vunderba•3mo ago
Honestly it's kind of inconsistent. Model releases sometimes seem to come in flurries - (it felt like Seedream and Nano-banana were within a few weeks of each other for example) and then the site will receive a pretty big update.
lukasb•3mo ago
What do you use for evaluation? gemini-2.5-pro is at the top of MMLU and has been best for me but always looking for better.
vunderba•3mo ago
Recently I've found myself getting the evaluation simultaneously from to OpenAI gpt-5, Gemini 2.5 Pro, and Qwen3 VL to give it a kind of "voting system". Purely anecdotal but I do find that Gemini is the most consistent of the three.
lukasb•3mo ago
Interesting, I'll give voting a shot, thanks.
motbus3•3mo ago
I am running similar experiment but so far, changing the seed of openai seems to give similar results. Which if that confirms, is concerning to me on how sensitive it could be
dangoodmanUT•3mo ago
I found the opposite. GPT-5 is better at judging along a true gradient of scores, while Gemini loves to pick 100%, 20%, 10%, 5%, or 0%. Like you never get a 87% score.
svantana•3mo ago
That's a great website! Feature request: a button to toggle all the sliders left or right at the same time - would make it easier to glance the results without lots of finicky mouse moves.
MattRix•3mo ago
Seconding this. Once you’ve seen the original image once, you don’t need to see it each time. The idea of syncing the sliders in the current group is a clever solution.
vunderba•3mo ago
Thanks. That's a great idea - I also incorporated @MattRix proposal of syncing the sliders. It should be up now!
scotty79•3mo ago
Seedream seems to be clear winner
vednig•3mo ago
Other Post: https://news.ycombinator.com/item?id=45708493
cjrd•3mo ago
Eh
gus_massa•3mo ago
It looks like a post about the presentation in the conference. No discussion. Sometimes the first post about a topic doesn't geht traction but a layer post gets more popular.
daemonologist•3mo ago
I confess that I don't quite get the point here - is it just that they've paid the inference costs for a dataset than can be used for distillation/other research?
peddling-brink•3mo ago
Essentially yes, it’s a data set that can help train or fine tune another model or similar research. From the site:

> Pico-Banana-400K serves as a versatile resource for advancing controllable and instruction-aware image editing. Beyond single-step editing, the dataset enables multi-turn, conversational editing and reward-based training paradigms.

TechSquidTV•3mo ago
Can it be? Has Apple FINALLY joined the party? Very ironic they are using an open dataset from Google... and Gemini for prompts by Google.

I'm happy to see something from Apple but this seems so low-tech that it could be one of my own local ComfyUI workflows.

echelon•3mo ago
They're distilling Nano Banana with a Google dataset, letting anyone more easily build and test their own systems. It's kind of funny how easy this is to do.

"You wouldn't steal a car," but anyone can distill an expensive, fully trained model in order to build their own.

This is going to be one of the most important categories of image model. It's good that we have more than Google and the Chinese (ByteDance, et al) with competent editing models. I don't think Flux Kontext is keeping up.

It'd be really nice if we had a Nano Banana-calibur model as open source.

kranke155•3mo ago
Flux Kontext is not keeping up. Even then Flux has become only partially open source. They keep the more advanced models API only.
ttul•3mo ago
Flux backbone is too rigid. Very difficult to fine tune. Qwen is where it’s at these days.
ThrowawayTestr•3mo ago
I've gotten mind blowing results with flux.1 Dev. Is the API even better?
TechSquidTV•3mo ago
Qwen Image Edit? Though it is a little soft and plasticy
ttul•3mo ago
Unless you fine tune it… the guys of Qwen are amazing.
skissane•3mo ago
The license is CC BY-NC-ND - I’m not sure who is going to be able to use it given the NC-ND part… especially given the potential uncertainty over what uses count as commercial and what counts as derivative works. OTOH, given the bulk of this dataset is AI outputs, its copyrightability is an open question.
hsbauauvhabzb•3mo ago
I find caring about a licence for an LLM highly ironic.
littlestymaar•3mo ago
It's not even an LLM, it's a dataset.

And clearly, if training on copyrighted material is fair use as every LLM makers claim, then this license has literally no weight.

Also, NAL but IIRC an automatically generated dataset isn't copyrightable in the first place.

DrewADesign•3mo ago
Honor among thieves?
niek_pas•3mo ago
> CC-BY-NC-ND or Creative Commons Attribution NonCommercial NoDerivs, is the most restrictive license offered by Creative Commons. With this license, the user (while attributing the original creator) can only share the work but not change it in any way or ever use it commercially.
qoez•3mo ago
Output from a generative AI model has already been deemed non copyrightable and the license can't really overwrite that
poly2it•3mo ago
Worldwide?
skissane•3mo ago
I don’t think anyone really knows the answer yet. UK law has much looser standards for copyrightability than US law - UK law accepts the “sweat of the brow” doctrine - mere human effort is enough to create copyright, even if it lacks any significant creative element-under UK law, a transcriptionist transcribing an audio recording creates a new copyright in the transcription separate from the copyright in the audio itself; US law does not consider a mere verbatim transcription to be sufficiently original to create a new copyright. But, will UK judges extend “sweat of the brow” to include AI sweat as well as human sweat? My gut feel is probably “yes”, but I’m not aware of any case law on the topic yet. A complicating factor is there are a lot of wealthy vested interests who are going to be pushing for the law in this area to evolve in a way which suits them - both in the courts and in Parliament - so the law might not evolve in the way you’d expect if judges were just left to logically extend existing precedents.

Even in the US, I think the situation is complex. If I prompt an LLM to edit a copyrighted human-written text, the LLM output is going to be copyrighted, because even if the LLM’s changes aren’t copyrightable, the underlying text is. And what happens if an LLM proposes edits, and then a human uses their own judgement to decide which LLM edits to accept and which not to? That act of human judgement might provide grounds for copyrightability which weren’t present in the raw LLM output.

BarakWidawsky•3mo ago
Looks like the dataset is distilled from Gemini nano-banana

Definitely very useful, but I’m so curious how the original datasets from these image editing models were created. I’m guessing a lot of it is synthetic data to construct scenes programmatically with layers

ttul•3mo ago
My rough guess is that they set a few workflows combining analytical and ML-based image manipulations to generate the training set. For instance, you can get a long way by having a segmentation model identify and mask various objects and then apply simple analytical manipulations to the masked areas such as changing their color, or diffusing new content into that area using masked guidance to another image diffusion model. In this way, you can create training pairs that your editing model learns to invert, such as “turn the woman’s hair into blonde hair” (start with a blonde haired woman, mask the hair, and get a diffusion model to turn it brown; this gives you the scene you can now invert as a training pair).
zuInnp•3mo ago
Maybe it is only me, but all the emojis in the readme look like AI wrote it and instantly make stop reading it ...
thinkingemote•3mo ago
Another glaring giveaway is the over use of numbered lists and bullet point lists.

Personally it makes me less likely to read it but the content might be useful. I have some general tech interest but am not overwhelmingly interested in the subject. Sometimes good things crop up on HN too.

Now, if an author was writing for an audience with the intention to attract the interest of people who were not enthusiasts to become enthusiasts of their product they would create something readable and attractive. The LLM hasn't here.

Together, this leads me to think that the readme is not for me but is just for dedicated enthusiasts.

exadeci•3mo ago
I've personally always preferred number lists and bullet points rather than wordly paragraphs that say the same thing.

I guess that makes me an LLM

stefan_•3mo ago
All the READMEs these days are such a tell. It's okay when explicitly prompted, but now thanks to reinforcement learning through people who have no clue, all the models just top off every change with some pointless documentation change.
ThrowawayTestr•3mo ago
I've seen so many repos with bare readmes that I don't even mind generated ones.
cubefox•3mo ago
Any idea why they didn't use GPT-4o image generation?
Alifatisk•3mo ago
I think it's because Geminis nano banano is better than 4o imagegen at creating and editing images from instructions
Jackson__•3mo ago
Because GPT-4o is too orange (literally).
neom•3mo ago
Someone who works in AI told me they think that was trained in as a "watermark", apparently the same is true with the em-dashes, to "ease people into AI" or something.
svantana•3mo ago
Gemini is better. See results of double-blind evaluation here:

https://lmarena.ai/leaderboard/image-edit

pwython•3mo ago
Valid question, as they already have a partnership with OpenAI to use ChatGPT in Siri. I personally use GPT for illustrations and Nano Banana for photo edits (Midjourney for realistic photos).

As an aside, perhaps they're using GPT/Codex for coding. Did anyone else notice the use of emojis and → in their code?

whywhywhywhy•3mo ago
4o generates new images that try to be close but not exact it doesn't edit existing ones so wouldn't be much use for this.
exadeci•3mo ago
Someone here shared an interesting comparison of models performances, and Gemini does seem much better

https://genai-showdown.specr.net/image-editing

sollewitt•3mo ago
AI industry: please _please_ get it together with naming. There shouldn’t be this much overlap between this, a dataset, and a massive image model which was already given a garbage name to begin with.

Don’t get me started in how “agent” is a term of art that means absolutely nothing, encompassing everything from a plain old shell script to a full language model.

3abiton•3mo ago
To be fair, as the "AI" industry came late, the dibs were called on all the cool acronyms/names (LoRa for example).
andai•3mo ago
That's true. And if they did somehow think of a cool name, they simply wouldn't have the resources to purchase the rights to it ;)
NBJack•3mo ago
Is it just me, or do they switch the names a bit as they go along? Maybe I just missed something?

> Dataset Statistics

> Nano-Banana-400K contains ~400K image editing data, covering a wide visual and semantic range drawn from real-world imagery.

fritzo•3mo ago
Missed opportunity to call it banana-seeds-400k
watersb•3mo ago
STILL WAITING FOR "BANANA-JR-6000"
MeteorMarc•3mo ago
Thought this was about Raspberry Pi because of the assocations with the banana pi and the pi pico.
3form•3mo ago
And 400. I genuinely thought it's going to be a Banana Pi in keyboard form factor.
ttul•3mo ago
Image editing model training is fascinating. One method for training image editing models involves using a second model to apply the inverse of the change you want the model to learn. Typically, the task you’re asking the second model to perform is easy, whereas the inverse task is difficult.

For example, you might ask the second model to cover the person’s face with a black square; a VLM model notes that the person is a man with brown hair and round glasses. Then, during training, the resulting image is presented along with the prompt, “Remove the black square from the man’s face. He has brown hair and round glasses.”

The model now learns how to remove black squares and replace them with a man’s face with brown hair and round glasses.

Since the training data is easily synthesized using existing models, you can generate enormous amounts of it - often very cheaply. For specialized editing tasks, this technique is really powerful. Build your training set for your special purpose task, fine tune an existing image editing model such as Qwen Image Edit to produce a new checkpoint or LoRA (often a LoRA is more than good enough) and then you have a special purpose model to perform whatever narrow editing task you need it to perform on your image data.

onlyrealcuzzo•3mo ago
Are these models built atop models that already understand natural language?

If the commands all follow the same syntax, it's easy to imagine how you can generate a good training set.

But how to they fully grasp natural language to be able to perform tasks worded unexpectedly, which would be easy to parse, if they understood natural language?

jerf•3mo ago
"But how to they fully grasp natural language to be able to perform tasks worded unexpectedly, which would be easy to parse, if they understood natural language?"

A Large Language Model. Pardon me for spelling out the full acronym, but it is what it is for a reason.

I think a lot of the whiz-bang applications of LLMs have drowned it out, but LLMs are effectively the solution to the long-standing problem of natural language understanding, and that alone would be enough to make them a ground-breaking technology. Taking English text and translating it with very high fidelity into the vector space these models understand is amazing and I think somewhat underappreciated.

ttul•3mo ago
Yes, the newer image and video editing models have an LLM bolted onto them. The rich embeddings from the LLM are fed into a diffusion transformer (DiT) alongside a tokenized version of the input image. These two streams “tell” the model what to do.
ThrowawayTestr•3mo ago
I love open datasets. The future of LLM is open source models.