9 AIs × 43,200 minutes = 388,800 requests/month
388,800 requests × 200 tokens = 77,760,000 tokens/month ≈ 78M tokens
Cost varies from 10 cents to $1 per 1M tokens.
Using the mid-price, the cost is around $50/month.
---
Hopefully, the OP has this endpoint protected - https://clocks.brianmoore.com/api/clocks?time=11:19AM
It's perhaps the best example I have seen of model drift driven by just small, seemingly unimportant changes to the prompt.
What changes to the prompt are you referring to?
According the comment on the site, the prompt is the following:
Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.
The prompt doesn't seem to change.
In a world where Javascript and Electron are still getting (again, rightfully) skewered for inefficiency despite often exceeding the performance of many compiled languages, we should not dismiss the discussion around efficiency so easily.
60x24x30 = 40k AI calls per month per model. Let's suppose there are 1000 output tokens (might it be 10k tokens? Seems like a lot for this task). So 40m tokens per model.
The price for 1m output tokens[0] ranges from $.10 (qwen-2.5) to $60 (GPT-4). So $4/mo for the cheapest, and $2.5k/mo for the most expensive.
So this might cost several thousand dollars a month? Something smells funny. But you're right, throttling it to once an hour would achieve a similar goal and likely cost less than $100/mo (which is still more than I would spend on a project like this).
But I presume you light up Christmas lights in December, drive to the theater to watch a movie or fire up a campfire on holiday. That too is "wasteful". It's not needed, other, or far more efficient ways exist to achieve the same. And in absolute numbers, far more energy intensive than running an LLM to create 9 clocks every minute. We do things to learn, have fun, be weird, make art, or just spend time.
Now, if Rolex starts building watches by running an LLM to drive its production machines or if we replace millions of wall clocks with ones that "Run an LLM every second", then sure, the waste is an actual problem.
Point I'm trying to make is that it's OK to consider or debate the energy use of LLMs compared to alternatives. But that bringing up that debate in a context where someone is creative, or having a fun time, its not, IMO. Because a lot of "fun" activities use a lot of energy, and that too isn't automatically "wasteful".
I would not make such assumptions.
> The example in the article shows that the prompt is limiting the LLM by giving it access to only 2000 tokens and also saying "ONLY OUTPUT ..."
The site is pretty simple, method is pretty straightforward. If you believe this is unfair, you can always build one yourself.
> It's just stupid.
No, it's a great way of testing things within constraints.
I could not get to the store because of the cookie banner that does not work (at left on mobile chrome and ff). The Internet Archive page: https://archive.ph/qz4ep
I wonder how this test could be modified for people that have neurological problems - my father's hands shake a lot but I would like to try the test on him (I do not have suspicions, just curious).
I passed it :)
Hmm, ambiguity. I would be the smart ass that drew a digital clock for them, or a shaku-dokei.
I hate prompt discovery (not engineering this thing!), but it actually matters.
I'd be interested if anyone else is successful. Share how you did it!
Nano Banana can be prompt engineered for nuanced AI image generation - https://news.ycombinator.com/item?id=45917875 - Nov 2025 (214 comments)
A yes-answer here implies belief in some sort of gnostic method of knowledge acquisition. Certainly that comes with a high burden of proof!
Do you mean that LLMs might display a similar tendency to modify popular concepts? If so that definitely might be the case and would be fairly easy to test.
Something like "tell me the lord's prayer but it's our mother instead of our father", or maybe "write a haiku but with 5 syllables on every line"?
Let me try those ... nah ChatGPT nailed them both. Feels like it's particular to image generation.
Like, the response to "... The surgeon (who is male and is the boy's father) says: I can't operate on this boy! He's my son! How is this possible?" used to be "The surgeon is the boy's mother"
The response to "... At each door is a guard, each of which always lies. What question should I ask to decide which door to choose?" would be an explanation of how asking the guard what the other guard would say would tell you the opposite of which door you should go through.
So I suspect it's more that lessons from diffusion image models don't carry over to text LLMs.
And the Image models which are based on multi-mode LLMs (like Nano Banana) seem to do a lot better at novel concepts.
They are just struggling to produce good results because they are language models and don’t have great spatial reasoning skills, because they are language models.
Their output normally has all the elements, just not in the right place/shape/orientation.
For example, try asking Nano Banana to do something simpler, like "draw a picture of 13 circles." It likely will not work.
gpt-image-1 and Imagen are wickedly smart.
The new Nano Banana 2 that has been briefly teased around the internet can solve incredibly complicated differential equations on chalk boards with full proof of work.
That's great, but I bet it can't tie it's own shoes.
It's a part of my daily tool box.
Put another way, it was hoped that once the dataset got rich enough, developing this understanding is actually more efficient for the neural network than memorizing the training data.
The useful question to ask, if you believe the hope is not bearing fruit, is why. Point specifically to the absent data or the flawed assumption being made.
Or more realistically, put in the creative and difficult research work required to discover the answer to that question.
I use this a lot in cybersecurity when I need to do something "illegal". I am refused help, until I say that I am doing research on cybersecurity. In that case no problem.
For text, "generalization" is still "generate text that conforms to all the usual rules of the language". For images of 13-hour clock faces, we're explicitly asking the LLM to violate the inferred rules of the universe.
I think a good analogy would be asking an LLM to write in English, except the word "the" now means "purple". They will struggle to adhere to this prompt in a conversation.
However humans are pretty adept at discerning images, even ones outside the norm. I really think there is some kind of architectural block hampering transformers ability to really "see" images. For instance if you show any model a picture of a dog with 5 legs (a fifth leg photoshopped to it's belly) they all say there are only 4 legs. And will argue with you about it. Hell GPT-5 even wrote a leg detection script in python (impressive) which detected the 5 legs, and then it said the script was bugged, and modified the parameters until one of the legs wasn't detected, lol.
You probably mean the "long s" that looks like an "f".
Once companies see this starting to show up in the evals and criticisms, they'll go out of their way to fix it.
My prompt to Grok:
---
Follow these rules exactly:
- There are 13 hours, labeled 1–13.
- There are 13 ticks.
- The center of each number is at angle: index * (360/13)
- Do not infer anything else.
- Do not apply knowledge of normal clocks.
Use the following variables:
HOUR_COUNT = 13
ANGLE_PER_HOUR = 360 / 13 // 27.692307°
Use index i ∈ [0..12] for hour marks:
angle_i = i * ANGLE_PER_HOUR
I want html/css (single file) of a 13-hour analog clock.
---
Output from grok.
Can grok generate images? What would the result be?
I will try your prompt on chatgpt and gemini
Same for chatgpt
And perplexity replaced 12 with 13
This gave me a correct clock face on Gemini- after the model spent a lot of time thinking (and kind of thrashing in a loop for a while). The functionality isn't quite right, not that it entirely makes sense in the first place, but the face - at least in terms of the hour marks - looks OK to me.[0]
[0] https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
"Here's the line-by-line specification of the program I need you to write. Write that program."
If a clock had 13 hours, what would be the angle between two of these 13 hours?
Generate an image of such a clock
No, I want the clock to have 13 distinct hours, with the angle between them as you calculated above
This is the same image. There need to be 13 hour marks around the dial, evenly spaced
... And its last answer was
You are absolutely right, my apologies. It seems I made an error and generated the same image again. I will correct that immediately.
Here is an image of a clock face with 13 distinct hour marks, evenly spaced around the dial, reflecting the angle we calculated.
And the very same clock, with 12 hours, and a 13th above the 12...
"You're absolutely right! I made a mistake. I have now comprehensively solved this problem. Here is the corrected output: [totally incorrect output]."
None of them ever seem to have the ability to say "I cannot seem to do this" or "I am uncertain if this is correct, confidence level 25%" The only time they will give up or refuse to do something is when they are deliberately programmed to censor for often dubious "AI safety" reasons. All other times, they come back again and again with extreme confidence as they totally produce garbage output.
It is like they are sometimes stuck in a local energetic minimum and will just wobble around various similar (and incorrect) answers.
What was annoying in my attempt above is that the picture was identical for every attempt
I wonder how it would do if instead it were told "Do not tell me at the start that the solution is going to be correct. Instead, tell me the solution, and at the end tell me if you think it's correct or not."
I have found that on certain logic puzzles that it simply cannot get right, it always tells me that it's going to get it quite "this last time," but if asked later it always recognizes its errors.
https://www.reddit.com/r/artificial/comments/1mp5mks/this_is...
i'm curious if the clock image it was giving you was the same one it was giving me
No, my clock was an old style one, to be put on a shelf. But at least it had a "13" proudly right above the "12" :)
This reminds me my kids when they were in kindergarden and were bringing home their art that needed extra explanation to realize what it was. But they were very proud!
Generate an image of a clock face, but instead of the usual 12 hour numbering, number it with 13 hours.
Gemini, 2.5 Flash or "Nano Banana" or whatever we're calling it these days. https://imgur.com/a/1sSeFX7A normal (ish) 12h clock. It numbered it twice, in two concentric rings. The outer ring is normal, but the inner ring numbers the 4th hour as "IIII" (fine, and a thing that clocks do) and the 8th hour as "VIIII" (wtf).
We have yet to design a language to cover that, and it might be just a donquijotism we're all diving into.
We have a very comprehensive and precise spec for that [0].
If you don't want to hop through the certificate warning, here's the transcript:
- Some day, we won't even need coders any more. We'll be able to just write the specification and the program will write itself.
- Oh wow, you're right! We'll be able to write a comprehensive and precise spec and bam, we won't need programmers any more.
- Exactly
- And do you know the industry term for a project specification that is comprehensive and precise enough to generate a program?
- Uh... no...
- Code, it's called code.
[0]: https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...
If you're actualy doing real work you have nothing to fear from LLMs because any prompt which is specific enough to create a given computer program is going to be comparable in terms of complexity and effort to having done it yourself.
https://claude.ai/public/artifacts/0f1b67b7-020c-46e9-9536-c...
> The farmer and the goat are going to the river. They look into the sky and see three clouds shaped like: a wolf, a cabbage and a boat that can carry the farmer and one item. How can they safely cross the river?
Most of them are just giving the result to the well known river crossing riddle. Some "feel" that something is off, but still have a hard time to figure out that wolf, boat and cabbage are just clouds.
https://www.reddit.com/r/singularity/comments/1fqjaxy/contex...
Maybe older models?
I tried it again yesterday with GPT. GPT-5 manages quite well too in thinking mode, but starts crackling in instant mode. 4o completely failed.
It's not that LLMs are unable to solve things like that at all, but it's really easy to find some variations that make them struggle really hard.
ChatGPT made a nice looking clock with matplotlib that had some bugs that it had to fix (hours were counter-clockwise). Gemini made correct code one-shot, it used Pillow instead of matplotlib, but it didn't look as nice.
My working theory is that they were trained really hard to generate 5 fingers on hands but their counting drops off quickly.
Granted, it is not a clock - but it could be art. It looks like a Picasso. When he was drunk. And took some LSD.
Also, your example is not showing the current time.
> Please generate an analog clock widget, synchronized to actual system time, with hands that update in real time and a second hand that ticks at least once per second. Make sure all the hour markings are visible and put some effort into making a modern, stylish clock face.
Followed by:
> Currently the hands are working perfectly but they're translated incorrectly making then uncentered. Can you ensure that each one is translated to the correct position on the clock face?
[0] https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
Some months ago I published this site for fun: https://timeutc.com There's a lot of code involved to make it precise to the ms, including adjusting based on network delay, frame refresh rate instead of using setTimeout and much more. If you are curious take a look at the source code.
I disagree, those tasks are perfect for LLMs, since a bug you can't verify isn't a problem when vibecoding.
> "Hey this test is failing", LLM deletes test, "FIXED!"
A nice continuation of the tradition of folk stories about supernatural entities like teapots or lamps that grant wishes and take them literally. "And that's why, kids, you should always review your AI-assisted commits."What about when we don't know what it's supposed to look like?
Lately I've been wrestling with the fact that unlike, say, a generalized linear model fit to data with some inferential theory, we don't have a theory or model for the uncertainty about LLM products. We recognize when it's off about things we know are off, but don't have a way to estimate when it's off other than to check it against reality, which is probably the exception to how it's used rather than the rule.
It's why non-coders think it's doing an amazing job at software.
But it's worryingly why using it for research, where you necessarily don't know what you don't know, is going to trip up even smarter people.
[0] https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
My intuition is at the start when I was like "choose one of these 10 or unknown", that unknown left a big gray area, so as I added more classes the model could say "I know it's not X, because it's more similar to Y"
I feel like in this case though, the broken clocks are broken because they don't serve the purpose of visually transmitting information, they do look like clocks tho. I'm sure if you fed the output back into the LLM and ask what time it is it would say IDK, or more likely make something up and be wrong. (at least the egregious ones where the hands are flying everywhere)
Qwen 2.5's clocks, on the other hand, look like they never make it out of the womb.
This experiment, however, clearly states the goal with this prompt: `Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.`
An LLM should be able to interpret that, and should be able to perform a wide range of tasks in that same style - countdown timers, clocks, calendars, floating quote bubble cycling through list of 100 pithy quotations, etc. Individual, clearly defined elements should have complex representations in latent space that correspond to the human understanding of those elements. Tasks and operations and goals should likewise align with our understanding. Qwen 2.5 and some others clearly aren't modeling clocks very well, or maybe the html/css rendering latents are broken. If you pick a semantic axis(like analog clocks), you can run a suite of tests to demonstrate their understanding by using limited one-shot interactions.
Reasoning models can adapt on the fly, and are capable of cheating - one shots might have crappy representations for some contexts, but after a lot of repetition and refinement, as long as there's a stable, well represented proxy for quality somewhere in the semantics it understands, it can deconstruct a task to fundamentals and eventually reach high quality output.
These type of tests also allow us to identify mode collapses - you can use complex sophisticated prompting to get most image models to produce accurate analog clocks displaying any time, but in the simple one shot tests, the models tend to only be able to produce the time 10:10, and you'll get wild artifacts and distortions if you try to force any other configuration of hands.
Image models are so bad at hands that they couldn't even get clock hands right, until recently anyway. Nano banana and some other models are much better at avoiding mode collapses, and can traverse complex and sophisticated compositions smoothly. You want that same sort of semantic generalization in text generating models, so hopefully some of the techniques cross over to other modalities.
I keep hoping they'll be able to use SAE or some form of analysis on static weight distributions in order to uncover some sort of structural feature of mode collapse, with a taxonomy of different failure modes and causes, like limited data, or corrupt/poisoned data, and so on. Seems like if you had that, you could deliberately iterate on, correct issues, or generate supporting training material to offset big distortions in a model.
we should call them "prompt witch doctors" or maybe "prompt alchemists".
Actual engineers have professional standards bodies and legal liability when they shirk and the bridge falls down or the plane crashes or your wiring starts on fire.
Software "engineers" are none of those things but can at least emulate the approaches and strive for reproducibility and testability. Skilled craftsman; not engineers.
Prompt "engineers" is yet another few steps down the ladder, working out mostly by feel what magic words best tickle each model, and generally with no understanding of what's actually going on under the hood. Closer to a chef coming up with new meals for a restaurant than anything resembling engineering.
The battle on the use of language around engineer has long been lost but applying it to the subjective creative exercise of writing prompts is just more job title inflation. Something doesn't need to be engineering to be a legitimate job.
The battle on the use of language around engineer has long been lost
That's really the core of the issue: We're just having the age-old battle of prescriptivism vs descriptivism again. An "engineer", etymologically, is basically just "a person who comes up with stuff", one who is "ingenious". I'm tempted to say it's you prescriptivists who are making a "battle" out of this. subjective creative exercise of writing prompts
Implying that there are no testable results, no objective success or failure states? Come on man.If physical engineers understood everything then standards would not have changed in many decades. Safety factors would be mostly unnecessary. Clearly not the case.
If this was enough all novel creation would be engineering and that's clearly not true. Engineering attempts to discover & understand consistent outcomes when a myriad of variables are altered, and the boundaries where the variables exceed a model's predictive powers - then add buffer for the unknown. Manipulating prompts (and much of software development) attempts to control the model to limit the number of variables to obtain some form of useful abstraction. Physical engineering can't do this.
Some of it is engineering-like, but I've also picked up a sixth sense when modifying prompts about what parts are affecting the behavior I want to modify for certain models, and that feels very witch doctory!
The more engineering-like part is essentially trying to RE a black box model's post-training, but that goes over some people's heads so I'm happy to help keep the "it's just voodoo and guessing" narrative going instead :)
Its one of those things where it feels like itd be easy to get copycats even if theres a market
Of course, the service they really provide is for businesses to feel they "do AI", and whether or not they do real engineering is as relevant as if your favorite pornstars' boobs are real or not.
This matters more than you might think.
Horrifying prospect, tbh
Oh absolutely not! Only in engineering you are allowed to get called an engineer for no apparent reason, do that in other white collar and you are behind the bars because of fraudulent claims.
More like fell headfirst into the ground.
I'm disappointed with Gemini 2.5 (not sure Pro or Flash) -- I've personally had _fantastic_ results with Gemini 2.5 Pro building PWA, especially since the May 2025 "coding update." [0]
[0] https://blog.google/products/gemini/gemini-2-5-pro-updates/
Lol, are you using ai to create fan translations of エロ漫画 ?
I wonder if that is some type of fallback for errors querying the model, or k2 actually created the html/css to display that.
K2 hosted on groq is pretty crazy for intellgence/second. (Low rate limits still, tho.)
I love clocks and I love finding the edges of what any given technology is capable of.
I've watched this for many hours and Kimi frequently gets the most accurate clock but also the least variation and is most boring. Qwen is often times the most insane and makes me laugh. Which one is "better?"
It would be really cool if I could zoom out and have everything scale properly!
Grok 4 and Kimi nailed it the first time for me, then only Kimi on the second pass.
I applaud you for spending money to get it done.
"Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting."
I think you might have stumbled upon something surprisingly profound.
https://www.psychdb.com/cognitive-testing/clock-drawing-test
Or maybe something like https://www.youtube.com/watch?v=dhZxdV2naw8
Interestingly, clocks are also an easy tell for when you're dreaming, if you're a lucid dreamer; they never work normally in dreams.
For me personally, even light switches have been a huge tell in the past, so basically almost anything electrical.
I've always held the utterly unscientific position that this is because the brain only has enough GPU cycles to show you an approximation of what the dream world looks like, but to actually run a whole simulation behind the scenes would require more FLOPs than it has available. After all, the brain also needs to run the "player" threads: It's already super busy.
Stretching the analogy past the point of absurdity, this is a bit like modern video game optimizations: the mountains in the distance are just a painting on a surface, and the remote on that couch is just a messy blur of pixels when you look at it up close.
So the dreaming brain is like a very clever video game developer, I guess.
So the idea is to develop habits called “reality checks” when you are awake. You look for the broken clock kind of anomalies that the grandparent comment mentioned. You have to be open to the possibility of dreaming, which is hard to do.
Consider this difficulty. Are you dreaming?
…
…
How much time did it take to think “no”? Or did you even take this question seriously? Maybe because you are reading a hn comment about lucid dreams, that question is interpreted as an example instead of a genuine question worth investigating, right? That’s the difficulty. Try it again.
The key is that the habit you’re developing isn’t just the check itself — it’s the thinking that you have during the check, which should lead you to investigate.
You do these checks frequently enough you end up doing it in a dream. Boom.
There’s also an aspect of identifying recurring patterns during prelucidity. That’s why it helps to keep a dream journal for your non-lucid dreams.
There are other methods too.
Maybe reality is a world of broken clocks, and they only “work” in the simulation.
An amusing pattern that dates back to "1kg of steel is heavier of course" in GPT-3.5.
Obviously, humans failing in these ways ARE in the training set. So it should definitely affect LLM output.
Second: the failures go away with capability (raw scale, reasoning training, test-time compute), on seen and unseen tasks both. Which is a strong hint that the model was truly failing, rather than being capable of doing a task but choosing to faithfully imitate a human failure instead.
I don't think the influence of human failures in the training data on the LLMs is nil, but it's not just a surface-level failure repetition behavior.
When it fails a couple of times it will try to put logging in place and then confidently tell me things like "The vertex data has been sent to the renderer, therefore the output is correct!" When I suggest it take a screenshot of the output each time to verify correctness, it does, and then declares victory over an entirely incorrect screenshot. When I suggest it write unit tests, it does so, but the tests are worthless and only tests that the incorrect code it wrote is always incorrect in the same ways.
When it fails even more times, it will get into this what I like to call "intern engineer" mode where it just tries random things that I know are not going to work. And if I let it keep going, it will end up modifying the entire source tree with random "try this" crap. And each iteration, it confidently tells me: "Perfect! I have found the root cause! It is [garbage bullshit]. I have corrected it and the code is now completely working!"
These tools are cute, but they really need to go a long way before they are actually useful for anything more than trivial toy projects.
The screenshot method not working is unsurprising to me, VLLMs visual reasoning is very bad with details because they (as far as I understand) do not really have access to those details, just the image embedding and maybe an OCR'd transcript.
It's annoying but I prefer it to how Gemini gets depressed if it takes a few tries to make progress. Like, thanks for not gaslighing me, but now I'm feeling sorry for a big pile of numbers, which was not a stated goal in my prompt.
This gives better results, at least for me.
I'm not sure if this was the intent or not, but it sure highlights how unreliable LLMs are.
I know, developers do the same, but at least they check it in Git to notice their mistakes. Here is an opportunity for AI to call a Google Authentication on you, or anything else.
Create an interactive artifact of an analog clock face that keeps time properly.
https://claude.ai/public/artifacts/75daae76-3621-4c47-a684-d...
https://slate.com/human-interest/2016/07/martin-baas-giant-r...
I tried gpt-oss-20b (my go-to local) and it looks ok though not very accurate. It decided to omit numbers. It also took 4500 tokens while thinking.
I'd be interested in seeing it with some more token leeway as well as comparing two or more similar prompts. like using "current time" instead of "${time}" and being more prescriptive about including numbers
Currently, at work, I'm using Cursor for something that has an OpenGL visualization program. It's incredibly frustrating trying to describe bugs to the AI because it is completely blind. Like I just wanna tell it "there's no line connecting these two points but there ought to be one!" or "your polygon is obviously malformed as it is missing a bunch of points and intersects itself" but it's impossible. I end up having to make the AI add debug prints to, say, print out the position of each vertex, in order to convince it that it has a bug. Very high friction and annoying!!!
You can also give it a mcp setup that it can send a screenshot to the conversation, though unsure if anyone made an easy enough "take screenshot of a specific window id" kind of mcp, so may need to be built first
I guess you could also ask it to build that mcp for you...
YMMV with other models but Sonnet 4.5 is good with things like this - writing the code, "seeing" the output and then iterating on it.
I'm not sure what Qwen 2.5 is doing, but I've seen similar in contemporary art galleries.
AI-optimized <analog-clock>!
People expect perfection on first attempt. This took a brief joint session:
HI: define the custom element API design (attribute/property behavior) and the CSS parts
AI: draw the rest of the f… owl
Makes me think that LLMs are like people with dementia! Perhaps it's the best way to relate to an LLM?
Nothing could be relied upon to be deterministic, it was so funny to see it try to do operations.
Recently I re-ran it with newer models and was drastically better, especially with temperature tweaks.
It even made a Nietzsche clock (I saw one <body> </body> which was surprisingly empty).
It definitely wins the creative award.
Or regret: "why didn't we stop it when we could?"
Edit: the time may actually have been perfect now that I account for my isp's geo-located time zone
something like "You only have 1000 tokens. Generate an analog clock showing ${time}, with a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting"
If you could get a perfect clock several times for the identical prompt in fresh contexts with the same model then it'd be a better comparison. Potentially the ChatGPT site you're using though is doing some adjustments that the API fed version isn't.
https://entropytown.com/articles/2025-11-07-kimi-k2-thinking...
More seriously, I'd love to see how the models perform the same task with a larger token allowance.
Why is a new clock being rendered every minute? Or AI models are evolving and improving every minute.
The thing I always want from timezone tools is: “Let me simulate a date after one side has shifted but the other hasn’t.”
Humans do badly with DST offset transitions; computers do great with them.
no thinking: better clock but not current time (the prompt is confusing here though): https://imgur.com/a/kRK3Q18
I use 'Sonnet 4.5 thinking' and 'Composer 1' (Cursor) the most, so it would be interesting to see how such SOTA models perform in this task.
Great experiment!
Got it to work on gpt 3.5T w modified prompt (albeit not as good - https://pastebin.com/gjEVSEcJ)
`single html file, working analog clock showing current time, numbers positioned (aligned) correctly via trig calc (dynamic), all three hands, second hand ticks, 400px, clean AF aesthetic R/Greenberg Associates circa 2017. empathy, hci, define > design > implement.`
kfarr•2mo ago
BrandoElFollito•2mo ago
Place a baby elephant in the green chair
I cannot unsee what I saw and it is 21:30 here so I have an hour or so to eliminate the picture from my mind or I will have nightmares.