frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

AI World Clocks

https://clocks.brianmoore.com/
186•waxpancake•1h ago•112 comments

A race condition in Aurora RDS

https://hightouch.com/blog/uncovering-a-race-condition-in-aurora-rds
108•theanomaly•1h ago•38 comments

Manganese is Lyme disease's double-edge sword

https://news.northwestern.edu/stories/2025/11/manganese-is-lyme-diseases-double-edge-sword
76•gmays•3h ago•16 comments

The disguised return of EU Chat Control

https://reclaimthenet.org/the-disguised-return-of-the-eus-private-message-scanning-plot
240•egorfine•2h ago•125 comments

Show HN: Tiny Diffusion – A character-level text diffusion model from scratch

https://github.com/nathan-barry/tiny-diffusion
29•nathan-barry•4d ago•4 comments

Minisforum Stuffs Entire Arm Homelab in the MS-R1

https://www.jeffgeerling.com/blog/2025/minisforum-stuffs-entire-arm-homelab-ms-r1
24•kencausey•1h ago•16 comments

Structured Outputs on the Claude Developer Platform (API)

https://www.claude.com/blog/structured-outputs-on-the-claude-developer-platform
17•adocomplete•1h ago•11 comments

Awk Technical Notes (2023)

https://maximullaris.com/awk_tech_notes.html
34•signa11•1w ago•5 comments

Bitchat for Gaza – messaging without internet

https://updates.techforpalestine.org/bitchat-for-gaza-messaging-without-internet/
163•ciconia•2h ago•64 comments

Incus-OS: Immutable Linux OS to run Incus as a hypervisor

https://linuxcontainers.org/incus-os/
114•_kb•1w ago•38 comments

AGI fantasy is a blocker to actual engineering

https://www.tomwphillips.co.uk/2025/11/agi-fantasy-is-a-blocker-to-actual-engineering/
457•tomwphillips•6h ago•426 comments

US Tech Market Treemap

https://caplocus.com/
46•gwintrob•3h ago•18 comments

RetailReady (YC W24) Is Hiring

https://www.ycombinator.com/companies/retailready/jobs/kGHAith-support-engineer
1•sarah74•3h ago

Meeting notes between Forgejo and the Dutch government via Git commits

https://codeberg.org/forgejo/sustainability/pulls/137/files
71•speckx•2h ago•27 comments

Honda: 2 years of ml vs 1 month of prompting - heres what we learned

https://www.levs.fyi/blog/2-years-of-ml-vs-1-month-of-prompting/
243•Ostatnigrosh•4d ago•88 comments

GPG and Me (2015)

https://moxie.org/2015/02/24/gpg-and-me.html
11•cl3misch•3d ago•1 comments

Magit manuals are available online again

https://github.com/magit/magit/issues/5472
94•vetronauta•8h ago•34 comments

Show HN: Chirp – Local Windows dictation with ParakeetV3 no executable required

https://github.com/Whamp/chirp
6•whamp•1h ago•2 comments

Germany to ban Huawei from future 6G network

https://www.bloomberg.com/news/articles/2025-11-13/germany-to-ban-huawei-from-future-6g-network-i...
103•teleforce•2h ago•82 comments

Linear Algebra Explains Why Some Words Are Effectively Untranslatable

https://aethermug.com/posts/linear-algebra-explains-why-some-words-are-effectively-untranslatable
74•mrcgnc•5h ago•51 comments

Winamp clone in Swift for macOS

https://github.com/mgreenwood1001/winamp
122•hyperbole•7h ago•92 comments

EDE: Small and Fast Desktop Environment (2014)

https://edeproject.org/
76•bradley_taunt•7h ago•30 comments

Being poor vs. being broke

https://blog.ctms.me/posts/2025-11-14-being-poor-or-being-broke/
288•speckx•3h ago•319 comments

I think nobody wants AI in Firefox, Mozilla

https://manualdousuario.net/en/mozilla-firefox-window-ai/
1028•rpgbr•6h ago•632 comments

Operating Margins

https://fi-le.net/margin/
232•fi-le•5d ago•88 comments

'No One Lives Forever' turns 25 and you still can't buy it legitimately

https://www.techdirt.com/2025/11/13/no-one-lives-forever-turns-25-you-still-cant-buy-it-legitimat...
103•speckx•3h ago•65 comments

Show HN: Dumbass Business Ideas

https://dumbassideas.com
15•elysionmind•2h ago•12 comments

Scientists Produce Powerhouse Pigment Behind Octopus Camouflage

https://today.ucsd.edu/story/scientists-produce-powerhouse-pigment-behind-octopus-camouflage
60•gmays•4d ago•5 comments

Nvidia is gearing up to sell servers instead of just GPUs and components

https://www.tomshardware.com/tech-industry/artificial-intelligence/jp-morgan-says-nvidia-is-geari...
150•giuliomagnifico•7h ago•64 comments

Moving Back to a Tiling WM – XMonad

https://wssite.vercel.app/blog/moving-back-to-a-tiling-wm-xmonad
51•weirdsmiley•3h ago•50 comments
Open in hackernews

AI World Clocks

https://clocks.brianmoore.com/
183•waxpancake•1h ago
"Every minute, a new clock is rendered by nine different AI models."

Comments

kfarr•1h ago
Add some voting and you got yourself an AI World Clock arena! https://artificialanalysis.ai/image/arena
syx•1h ago
I’m very curious about the monthly bill for such a creative project, surely some of these are pre rendered?
coffeecoders•56m ago
Napkin math:

9 AIs × 43,200 minutes = 388,800 requests/month

388,800 requests × 200 tokens = 77,760,000 tokens/month ≈ 78M tokens

Cost varies from 10 cents to $1 per 1M tokens.

Using the mid-price, the cost is around $50/month.

---

Hopefully, the OP has this endpoint protected - https://clocks.brianmoore.com/api/clocks?time=11:19AM

ugh123•1h ago
Cool, and marginally informative on the current state of things. but kind of a waste of energy given everything is re-done every minute to compare. We'd probably only need a handful of each to see the meaningful differences.
whoisjuan•1h ago
It's actually quite fascinating if you watch it for 5 minutes. Some models are overall bad, but others nail it in one minute and butcher it in the next.

It's perhaps the best example I have seen of model drift driven by just small, seemingly unimportant changes to the prompt.

alister•59m ago
> model drift driven by just small, seemingly unimportant changes to the prompt

What changes to the prompt are you referring to?

According the comment on the site, the prompt is the following:

Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.

The prompt doesn't seem to change.

sambaumann•49m ago
presumably the time is replaced with the actual current time at each generation. I wonder if they are actually generated every minute or if all 6480 permutations (720 minutes in a day * 9 llms) were generated and just show on a schedule
whoisjuan•33m ago
The time given to the model. So the difference between two generations is just somethng trivially different like: "12:35" vs 12:36"
moffkalast•51m ago
Kimi seems the only reliable one which is a bit surprising, and GPT 4o is consistently better than GPT 5 which on the other hand is unfortunately not surprising at all.
nbaugh1•41m ago
It is really interesting to watch them for a while. QWEN keeps outputting some really abstract interpretations of a clock, KIMI is consistently very good, GPT5's results line up exactly with my experience with its code output (overly complex and never working correctly)
ascorbic•1h ago
The energy usage is minuscule.
jdiff•1h ago
It's wasteful. If someone built a clock out of 47 microservices that called out to 193 APIs to check the current time, location, time zone, and preferred display format we'd rightfully criticize it for similar reasons.

In a world where Javascript and Electron are still getting (again, rightfully) skewered for inefficiency despite often exceeding the performance of many compiled languages, we should not dismiss the discussion around efficiency so easily.

Arisaka1•56m ago
What I find amusing with this argument is that, no one ever brought power savings when e.g. used "let me google that for you" instead of giving someone the answer to their question, because we saw the utility of teaching others how to Google. But apparently we can't see the utility of measuring the oversold competence of current AI models, given sufficiently large sampling size.
saulpw•41m ago
Let's do some math.

60x24x30 = 40k AI calls per month per model. Let's suppose there are 1000 output tokens (might it be 10k tokens? Seems like a lot for this task). So 40m tokens per model.

The price for 1m output tokens[0] ranges from $.10 (qwen-2.5) to $60 (GPT-4). So $4/mo for the cheapest, and $2.5k/mo for the most expensive.

So this might cost several thousand dollars a month? Something smells funny. But you're right, throttling it to once an hour would achieve a similar goal and likely cost less than $100/mo (which is still more than I would spend on a project like this).

[0] https://pricepertoken.com/

berkes•23m ago
Yes it is wasteful.

But I presume you light up Christmas lights in December, drive to the theater to watch a movie or fire up a campfire on holiday. That too is "wasteful". It's not needed, other, or far more efficient ways exist to achieve the same. And in absolute numbers, far more energy intensive than running an LLM to create 9 clocks every minute. We do things to learn, have fun, be weird, make art, or just spend time.

Now, if Rolex starts building watches by running an LLM to drive its production machines or if we replace millions of wall clocks with ones that "Run an LLM every second", then sure, the waste is an actual problem.

Point I'm trying to make is that it's OK to consider or debate the energy use of LLMs compared to alternatives. But that bringing up that debate in a context where someone is creative, or having a fun time, its not, IMO. Because a lot of "fun" activities use a lot of energy, and that too isn't automatically "wasteful".

energy123•28m ago
I sort of assumed they cached like 30 inferences and just repeat them, but maybe I'm being too cynical.
PeterStuer•1h ago
Why? This is diagonal to how LLM's work, and trivially solved by a minimal hybrid front/sub system.
em3rgent0rdr•1h ago
To gauge.
bayindirh•1h ago
Because, LLMs are touted to be the silver bullet of silver bullets. Built upon world's knowledge, and with the capacity to call upon updated information with agents, they are ought to rival the top programmers 3 days ago.
awkwam•48m ago
They might be touted like that but it seems like you don't understand how they work. The example in the article shows that the prompt is limiting the LLM by giving it access to only 2000 tokens and also saying "ONLY OUTPUT ...". This is like me asking you to solve the same problem but forcing you do de-activate half of your brain + forget any programming experience you have. It's just stupid.
bayindirh•39m ago
> like you don't understand how they work.

I would not make such assumptions.

> The example in the article shows that the prompt is limiting the LLM by giving it access to only 2000 tokens and also saying "ONLY OUTPUT ..."

The site is pretty simple, method is pretty straightforward. If you believe this is unfair, you can always build one yourself.

> It's just stupid.

No, it's a great way of testing things within constraints.

em3rgent0rdr•1h ago
Most look like they were done by a beginner programmer on crack, but every once in a while a correct one appears.
morkalork•1h ago
I'd say more like a blind programmer in the early stages of dementia. Able to write code, unable to form a mental image of what it would render as and can't see the final result.
pixl97•1h ago
DeepSeek and Kimi seem to have correct ones most of the time I've looked.
em3rgent0rdr•1h ago
yes, and sometimes Grok.
pixl97•40m ago
The hour hand commonly seems off on Grok.
BrandoElFollito•3m ago
DeepSeek told me that it cannot generate pictures and suggested code (which is very different)
shafoshaf•1h ago
It's interesting how drawing a clock is one of the primary signals for dementia. https://www.verywellhealth.com/the-clock-drawing-test-98619
BrandoElFollito•4m ago
This is very interesting, thank you.

I could not get to the store because of the cookie banner that does not work (at left on mobile chrome and ff). The Internet Archive page: https://archive.ph/qz4ep

I wonder how this test could be modified for people that have neurological problems - my father's hands shake a lot but I would like to try the test on him (I do not have suspicions, just curious).

I passed it :)

energy123•27m ago
If they can identify which one is correct, then it's the same as always being correct, just with an expensive compute budget.
larodi•1h ago
would be gr8t to also see the prompt this was done with
creade•1h ago
The ? has "Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting."
bananatron•1h ago
grok's looks like one of those clocks you'd find at a novelty shop
AlfredBarnes•1h ago
Its cool to see them get it right .....sometimes
zkmon•1h ago
Why are Deepseek and Kimi are beating other models by so much margin? Is this to do with their specialization for this task?
baltimore•1h ago
Since the first (good) image generation models became available, I've been trying to get them to generate an image of a clock with 13 instead of the usual 12 hour divisions. I have not been successful. Usually they will just replace the "12" with a "13" and/or mess up the clock face in some other way.

I'd be interested if anyone else is successful. Share how you did it!

snek_case•1h ago
From my experience they quickly fail to understand anything beyond a superficial description of the image you want.
atorodius•43m ago
That's less and less true

https://minimaxir.com/2025/11/nano-banana-prompts/

dang•42m ago
Related ongoing thread:

Nano Banana can be prompt engineered for nuanced AI image generation - https://news.ycombinator.com/item?id=45917875 - Nov 2025 (214 comments)

Scene_Cast2•1h ago
I've noticed that image models are particularly bad at modifying popular concepts in novel ways (way worse "generalization" than what I observe in language models).
IAmGraydon•58m ago
That's because they literally cannot do that. Doing what you're asking requires an understanding of why the numbers on the clock face are where they are and what it would mean if there was an extra hour on the clock (ie that you would have to divide 360 by 13 to begin to understand where the numbers would go). AI models have no concept of anything that's not included in their training data. Yet people continue to anthropomorphize this technology and are surprised when it becomes obvious that it's not actually thinking.
bobbylarrybobby•54m ago
It's interesting because if you asked them to write code to generate an SVG of a clock, they'd probably use a loop from 1 to 12, using sin and cos of the angle (given by the loop index over 12 times 2pi) to place the numerals. They know how to do this, and so they basically understand the process that generates a clock face. And extrapolating from that to 13 hours is trivial (for a human). So the fact that they can't do this extrapolation on their own is very odd.
echelon•53m ago
gpt-image-1 and Google Imagen understand prompts, they just don't have training data to cover these use cases.

gpt-image-1 and Imagen are wickedly smart.

The new Nano Banana 2 that has been briefly teased around the internet can solve incredibly complicated differential equations on chalk boards with full proof of work.

phkahler•21m ago
>> The new Nano Banana 2 that has been briefly teased around the internet can solve incredibly complicated differential equations on chalk boards with full proof of work.

That's great, but I bet it can't tie it's own shoes.

energy123•32m ago
The hope was for this understanding to emerge as the most efficient solution to the next-token prediction problem.

Put another way, it was hoped that once the dataset got rich enough, developing this understanding is actually more efficient for the neural network than memorizing the training data.

The useful question to ask, if you believe the hope is not bearing fruit, is why. Point specifically to the absent data or the flawed assumption being made.

Or more realistically, put in the creative and difficult research work required to discover the answer to that question.

ryandrake•29m ago
I wonder if you would have more success if you painstakingly described the shape and features of a clock in great detail but never used the words clock or time or anything that might give the AI the hint that they were supposed to output something like a clock.
BrandoElFollito•18m ago
And this is a problem for me. I guess that it would work, but as soon as the word "clock" appears, gone is the request because a clock HAS.12.HOURS.

I use this a lot in cybersecurity when I need to do something "illegal". I am refused help, until I say that I am doing research on cybersecurity. In that case no problem.

Workaccount2•10m ago
The problem is more likely the tokenization of images than anything. These models do their absolute worst when pictures are involved, but are seemingly miraculous at generalizing with just text.
godelski•8m ago
Yes, the problem is that these so called "world models" do not actually contain a model of the world, or any world
echelon•55m ago
That's just a patch to the training data.

Once companies see this starting to show up in the evals and criticisms, they'll go out of their way to fix it.

rideontime•18m ago
What would the "patch" be? Manually create some images of 13-hour clocks and add them to the training data? How does that solution scale?
godelski•9m ago
s/13/17/g ;)
coffeecoders•50m ago
LLMs are terrible for out-of-distribution (OOD) tasks. You should use chain of thought suppression and give constaints explictly.

My prompt to Grok:

---

Follow these rules exactly:

- There are 13 hours, labeled 1–13.

- There are 13 ticks.

- The center of each number is at angle: index * (360/13)

- Do not infer anything else.

- Do not apply knowledge of normal clocks.

Use the following variables:

HOUR_COUNT = 13

ANGLE_PER_HOUR = 360 / 13 // 27.692307°

Use index i ∈ [0..12] for hour marks:

angle_i = i * ANGLE_PER_HOUR

I want html/css (single file) of a 13-hour analog clock.

---

Output from grok.

https://jsfiddle.net/y9zukcnx/1/

BrandoElFollito•39m ago
Well, that's cheating :) You asked it to generate code, which is ok because it does not represent a direct generated image of a clock.

Can grok generate images? What would the result be?

I will try your prompt on chatgpt and gemini

BrandoElFollito•33m ago
Gemini failed miserably - a standard 12 hours clock

Same for chatgpt

And perplexity replaced 12 with 13

dwringer•13m ago
> Please create a highly unusual 13-hour analog clock widget, synchronized to system time, with fully animated hands that move in real time, and not 12 but 13 hour markings - each will be spaced at not 5-minute intervals, but at 4-minute-37-second intervals. This makes room for all 13 hour markings. Please pay attention to the correct alignment of the 13 numbers and the 13 hour marks, as well as the alignment of the hands on the face.

This gave me a correct clock face on Gemini- after the model spent a lot of time thinking (and kind of thrashing in a loop for a while). The functionality isn't quite right, not that it entirely makes sense in the first place, but the face - at least in terms of the hour marks - looks OK to me.[0]

[0] https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

chemotaxis•33m ago
> Follow these rules exactly:

"Here's the line-by-line specification of the program I need you to write. Write that program."

chiwilliams•25m ago
I'll also note that the output isn't quite right --- the top number should be 13 rather than 1!
layer8•3m ago
I mean, the specification for the hour marks (angle_i) starts with a mark at angle 0. It just followed that spec. ;)
BrandoElFollito•42m ago
This is really cool. I tried to prompt gemini but every time I got the same picture. I do not know how to share a session (like it is possible with Chatgpt) but the prompts were

If a clock had 13 hours, what would be the angle between two of these 13 hours?

Generate an image of such a clock

No, I want the clock to have 13 distinct hours, with the angle between them as you calculated above

This is the same image. There need to be 13 hour marks around the dial, evenly spaced

... And its last answer was

You are absolutely right, my apologies. It seems I made an error and generated the same image again. I will correct that immediately.

Here is an image of a clock face with 13 distinct hour marks, evenly spaced around the dial, reflecting the angle we calculated.

And the very same clock, with 12 hours, and a 13th above the 12...

ryandrake•32m ago
This is probably my biggest problem with AI tools, having played around with them more lately.

"You're absolutely right! I made a mistake. I have now comprehensively solved this problem. Here is the corrected output: [totally incorrect output]."

None of them ever seem to have the ability to say "I cannot seem to do this" or "I am uncertain if this is correct, confidence level 25%" The only time they will give up or refuse to do something is when they are deliberately programmed to censor for often dubious "AI safety" reasons. All other times, they come back again and again with extreme confidence as they totally produce garbage output.

BrandoElFollito•21m ago
I agree, I see the same even in simple code where they will bend backwards apologizing and generate very similar crap.

It is like they are sometimes stuck in a local energetic minimum and will just wobble around various similar (and incorrect) pictures.

What was annoying in my attempt above is that the picture was identical for every attempt

ryandrake•9m ago
These tools 'attitude' reminds me of an eager, but incompetent intern or a poorly trained administrative assistant, who works for a powerful CEO. All sycophancy, confidence and positive energy, but not really getting much done.
deathanatos•33m ago

  Generate an image of a clock face, but instead of the usual 12 hour numbering, number it with 13 hours. 

Gemini, 2.5 Flash or "Nano Banana" or whatever we're calling it these days. https://imgur.com/a/1sSeFX7

A normal (ish) 12h clock. It numbered it twice, in two concentric rings. The outer ring is normal, but the inner ring numbers the 4th hour as "IIII" (fine, and a thing that clocks do) and the 8th hour as "VIIII" (wtf).

bar000n•22m ago
It should be pretty clear already that anything which is based (limited?) to communicating words/text can never grasp conceptual thinking.

We have yet to design a language to cover that, and it might be just a donquijotism we're all diving into.

rideontime•19m ago
Really? I can grasp the concept behind that command just fine.
bayindirh•14m ago
> We have yet to design a language to cover that, and it might be just a donquijotism we're all diving into.

We have a very comprehensive and precise spec for that [0].

If you don't want to hop through the certificate warning, here's the transcript:

- Some day, we won't even need coders any more. We'll be able to just write the specification and the program will write itself.

- Oh wow, you're right! We'll be able to write a comprehensive and precise spec and bam, we won't need programmers any more.

- Exactly

- And do you know the industry term for a project specification that is comprehensive and precise enough to generate a program?

- Uh... no...

- Code, it's called code.

[0]: https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...

giancarlostoro•18m ago
Weird, I never tried that, I tried all the usual tricks that usually work including swearing at the model (this scarily works surprisingly well with LLMs) and nothing. I even tried to go the opposite direction, I want a 6 hour clock.
abathologist•1h ago
This is great. If you think that the phenomena of human-like text generation evinces human-like intelligence, then this should be taken to evince that the systems likely have dementia. https://en.wikipedia.org/wiki/Montreal_Cognitive_Assessment
AIorNot•58m ago
Imagine if I asked you to draw as pixels and operate a clock via html or create a jpeg with a pencil and paper and have it be accurate.. I suspect your handcoded work to be off by an order of magnitutde compared
jonplackett•1h ago
kimi is kicking ass
busymom0•1h ago
Because a new clock is generated every minute, looks like simply changing the time by a digit causes the result to be significantly different from the previous iteration.
shevy-java•1h ago
Now that is actually creative.

Granted, it is not a clock - but it could be art. It looks like a Picasso. When he was drunk. And took some LSD.

kburman•1h ago
These types of tests are fundamentally flawed. I was able to create perfect clock using gemini 2.5 pro - https://gemini.google.com/share/136f07a0fa78
sinak•1h ago
How are they flawed?
earthnail•58m ago
The results are not reproducable, as evidenced by parent poster.
micromacrofoot•51m ago
isn't that kind of the point of non-determinism?
jmdeon•57m ago
Aren't they attempting to also display current time though? Your share is a clock starting at midnight/noon. Kimi K2 seems to be the best on each refresh.
Drew_•53m ago
The website is regenerating the clocks every minute. When I opened it, Gemini 2.5 was the only working one. Now, they are all broken.

Also, your example is not showing the current time.

system2•36m ago
It wouldn't be hard to tell to pick up browser time as the default start point. Just a piece of prompt.
allenu•52m ago
I don't think this is a serious test. It's just an art piece to contrast different LLMs taking on the same task, and against themselves since it updates every minute. One minute one of the results was really good for me and the next minute it was very, very bad.
dwringer•49m ago
Even Gemini Flash did really well for me[0] using two prompts - the initial query and one to fix the only error I could identify.

> Please generate an analog clock widget, synchronized to actual system time, with hands that update in real time and a second hand that ticks at least once per second. Make sure all the hour markings are visible and put some effort into making a modern, stylish clock face.

Followed by:

> Currently the hands are working perfectly but they're translated incorrectly making then uncentered. Can you ensure that each one is translated to the correct position on the clock face?

[0] https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

lxe•1h ago
Honestly, I think if you track the performance of each over time, since these get regenerated once in a while, you can then have a very, very useful and cohesive benchmark.
1yvino•1h ago
i wonder kwen prompt woud look like hallucination?
fschuett•56m ago
Reminds me of this: https://www.youtube.com/watch?v=OGbhJjXl9Rk
S0y•56m ago
To be fair, This is a deceptively hard task.
bobbylarrybobby•52m ago
Without AI assistance, this should take ~10–15 minutes for a human. Maybe add 5 minutes if you're not allowed to use d3.
alexmorley•39m ago
It's just html/css so no js at all let alone d3.
zkmon•54m ago
Was Claude banned from this Olympics?
giancarlostoro•14m ago
Haiku is the lightweight Claude model, I'm not sure why they picked the weaker model.
collimarco•53m ago
In any case those clocks are all extremely inaccurate, even if AI could build a decent UI (which is not the case).

Some months ago I published this site for fun: https://timeutc.com There's a lot of code involved to make it precise to the ms, including adjusting based on network delay, frame refresh rate instead of using setTimeout and much more. If you are curious take a look at the source code.

mstipetic•52m ago
GPT-5 is embarrassing itself. Kimi and DeepSeek are very consistently good. Wild that you can just download these models.
shubham_zingle•51m ago
not sure about the accuracy though, although shooting in the dark
awkwam•41m ago
Limiting the model to only use 2000 tokens while also asking it to output ONLY HTML/CSS is just stupid. It's like asking a programmer to perform the same task while removing half their brain and also forget about their programming experience. This is a stupid and meaningless benchmark.
system2•38m ago
Ask Claude or ChatGPT to write it in Python, and you will see what they are capable of. HTML + CSS has never been the strong suit of any of these models.
camalouu•5m ago
Claude generates some js/css stuff even when i don't ask for it. I think Claude itself at least believes he is good at this.
munro•28m ago
Amazing, some people are so enamored with LLMs who use them for soft outcomes, and disagree with me when I say be careful they're not perfect -- this is such a great non technical way to explain the reality I'm seeing when using on hard outcome coding/logic tasks. "Hey this test is failing", LLM deletes test, "FIXED!"
novemp•24m ago
Oh cool, it's the schizophrenia clock-drawing test but for AI.
otterley•21m ago
Watching this over the past few minutes, it looks like Kimi K2 generates the best clock face most consistently. I'd never heard of that model before today!

Qwen 2.5's clocks, on the other hand, look like they never make it out of the womb.

bArray•11m ago
It could be that the prompt is accidentally (or purposefully) more optimised for Kimi K2, or that Kimi K2 is better trained on this particular data. LLM's need "prompt engineers" for a reason to get the most out of a particular model.
frizlab•7m ago
I knew of Kimi K2 because it’s the model used by Kagi to generate the AI answers when query ends with an interrogation point.
abixb•2m ago
>Qwen 2.5's clocks, on the other hand, look like they never make it out of the womb.

More like fell headfirst into the ground.

earth2mars•19m ago
https://gemini.google.com/share/00967146a995 works perfectly fine with gemini 2.5 pro
lanewinfield•16m ago
nice. I restrict to 2000 tokens for mine, how many was that?
lanewinfield•19m ago
hi, I made this. thank you for posting.

I love clocks and I love finding the edges of what any given technology is capable of.

I've watched this for many hours and Kimi frequently gets the most accurate clock but also the least variation and is most boring. Qwen is often times the most insane and makes me laugh. Which one is "better?"

anigbrowl•8m ago
I really like this. The broken ones are sometimes just failures, but sometimes provide intriguing new design ideas.
ryandrake•15m ago
I've been struggling all week trying to get Claude Code to write code to produce visual (not the usual, verifiable, text on a terminal) output in the form of a SDL_GPU rendered scene consisting of the usual things like shaders, pipelines, buffers, textures and samplers, vertex and index data and so on, and boy it just doesn't seem to know what it's doing. Despite providing paragraphs-long, detailed prompts. Despite describing each uniform and each matrix that needs to be sent. Despite giving it extremely detailed guidance about what order things need to be done in. It would have been faster for me to just write the code myself.

When it fails a couple of times it will try to put logging in place and then confidently tell me things like "The vertex data has been sent to the renderer, therefore the output is correct!" When I suggest it take a screenshot of the output each time to verify correctness, it does, and then declares victory over an entirely incorrect screenshot. When I suggest it write unit tests, it does so, but the tests are worthless and only tests that the incorrect code it wrote is always incorrect in the same ways.

When it fails even more times, it will get into this what I like to call "intern engineer" mode where it just tries random things that I know are not going to work. And if I let it keep going, it will end up modifying the entire source tree with random "try this" crap. And each iteration, it confidently tells me: "Perfect! I have found the root cause! It is [garbage bullshit]. I have corrected it and the code is now completely working!"

These tools are cute, but they really need to go a long way before they are actually useful for anything more than trivial toy projects.

fancy_pantser•9m ago
Have you given using MCPs to provide documentation and examples a shot? I always have to bring in docs since I don't work in Python and TS+React (which it seems more capable at) and force it to review those in addition to any specification. e.g. Context7
paxys•9m ago
Something I'm not able to wrap my head around is that Kimi K2 is the only model that produces a ticking second hand on every attempt while the rest of them are always moving continuously. What fundamental differences in model training or implementation can result in this disparity? Or was this use case programmed in K2 after the fact?
aavshr•4m ago
just curious, why not the sonnet models? In my personal experience, Anthropic's Sonnet models are the best when it comes to things like this!
xyproto•1m ago
Try adding to the prompt that it has a PhD in Computer Science and have many methods for dealing with complexity.

This gives better results, at least for me.